Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor...

11
Hindawi Publishing Corporation International Journal of Evolutionary Biology Volume 2013, Article ID 818954, 10 pages http://dx.doi.org/10.1155/2013/818954 Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor Genes of Mouse and Human Rosa Calvello, Antonia Cianciulli, and Maria Antonietta Panaro Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari, Via Orabona 4, 70126 Bari, Italy Correspondence should be addressed to Maria Antonietta Panaro; [email protected] Received 10 July 2013; Revised 18 October 2013; Accepted 26 October 2013 Academic Editor: Y. Satta Copyright © 2013 Rosa Calvello et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conservation/mutation in the intronic initial and terminal hexanucleotides was studied in 26 orthologous cytokine receptor genes of Mouse and Human. Introns began and ended with the canonical dinucleotides GT and AG, respectively. Identical configurations were found in 57% of the 5 hexanucleotides and 28% of the 3 hexanucleotides. e actual conservation percentages of the individual variable nucleotides at each position in the hexanucleotides were determined, and the theoretical rates of conservation of groups of three nucleotides were calculated under the hypothesis of a mutual evolutionary independence of the neighboring nucleotides (random association). Analysis of the actual conservation of groups of variable nucleotides showed that, at 5 , GTGAGx was significantly more expressed and GTAAGx was significantly less expressed, as compared to the random association. At 3 , TTTxAG and xTGCAG were overexpressed as compared to a random association. Study of Mouse and Human transcript variants involving the splice sites showed that most variants were not inherited from the common ancestor but emerged during the process of speciation. In some variants the silencing of a terminal hexanucleotide determined skipping of the downstream exon; in other variants the constitutive splicing hexanucleotide was replaced by another potential, in-frame, splicing hexanucleotide, leading to alterations of exon lengths. 1. Introduction In most protein-coding genes of eukaryotes the coding exons alternate with noncoding introns. e nuclear pre-mRNA is a transcript of the whole gene, including the introns. However, before it is exported to the cytoplasm, the introns are removed through a process called pre-mRNA splicing and the exons orderly joined to form the mature coding mRNA. Cutting at splicing sites is usually accomplished with a high degree of precision, as needed for the synthesis of the correct protein products in the process of translation. Precision splicing requires the existence of specific sequence arrangements at appropriate pre-mRNA sites (signals) and is affected by a massive ribonucleoprotein complex, the spliceosome, which has evolved to interact with these sequences. Most oſten the splicing signal is univocal and robust enough to allow one single splicing pattern only at each site. But when the splicing signals are less robust or possibly not univocal, “physiological” alternative splicing patterns may occur, with total or partial deletion of some exons or retention of in- frame introns resulting in alterations of the encoded protein product. In most of these cases the generated protein either retains a similar function to that of the default protein or may acquire a different biological function [13]. In other instances “unsuited” mRNAs are prevented from crossing the nuclear membrane, a selection structure which emerged in eukaryotes to separate intron-containing RNAs from the translation apparatus. In addition, other cellular systems may degrade irregularly spliced or mutated mRNAs (nonsense- mediated decay, NMD) [46] or, eventually, altered proteins may be ubiquitinated and proteasome-degraded [7]. How- ever, in some cases, despite all these control mechanisms, an irregular splicing or a disruption of the physiological alternative splicing may impair cell functions and bring about severe illnesses [810]. Special strategies to allow some unspliced virus-derived mRNAs (required for the synthesis of some envelope and capsid proteins) to be exported out of the nucleus are adopted by human immunodeficiency virus type 1 [11]. e main intronic splicing signals are the 5 splice site (5 ss) or splice donor site; the branch site; the polypyrimidine

Transcript of Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor...

Page 1: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

Hindawi Publishing CorporationInternational Journal of Evolutionary BiologyVolume 2013 Article ID 818954 10 pageshttpdxdoiorg1011552013818954

Research ArticleConservationMutation in the Splice Sites of Cytokine ReceptorGenes of Mouse and Human

Rosa Calvello Antonia Cianciulli and Maria Antonietta Panaro

Department of Biosciences Biotechnologies and Biopharmaceutics University of Bari Via Orabona 4 70126 Bari Italy

Correspondence should be addressed to Maria Antonietta Panaro mariaantoniettapanarounibait

Received 10 July 2013 Revised 18 October 2013 Accepted 26 October 2013

Academic Editor Y Satta

Copyright copy 2013 Rosa Calvello et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Conservationmutation in the intronic initial and terminal hexanucleotides was studied in 26 orthologous cytokine receptor genesofMouse andHuman Introns began and ended with the canonical dinucleotides GT and AG respectively Identical configurationswere found in 57 of the 51015840 hexanucleotides and 28 of the 31015840 hexanucleotides The actual conservation percentages of theindividual variable nucleotides at each position in the hexanucleotides were determined and the theoretical rates of conservationof groups of three nucleotides were calculated under the hypothesis of a mutual evolutionary independence of the neighboringnucleotides (random association) Analysis of the actual conservation of groups of variable nucleotides showed that at 51015840 GTGAGxwas significantly more expressed and GTAAGx was significantly less expressed as compared to the random association At 31015840TTTxAG and xTGCAG were overexpressed as compared to a random association Study of Mouse and Human transcript variantsinvolving the splice sites showed that most variants were not inherited from the common ancestor but emerged during the processof speciation In some variants the silencing of a terminal hexanucleotide determined skipping of the downstream exon in othervariants the constitutive splicing hexanucleotide was replaced by another potential in-frame splicing hexanucleotide leading toalterations of exon lengths

1 Introduction

In most protein-coding genes of eukaryotes the coding exonsalternate with noncoding intronsThe nuclear pre-mRNA is atranscript of the whole gene including the introns Howeverbefore it is exported to the cytoplasm the introns are removedthrough a process called pre-mRNA splicing and the exonsorderly joined to form the mature coding mRNA Cutting atsplicing sites is usually accomplished with a high degree ofprecision as needed for the synthesis of the correct proteinproducts in the process of translation Precision splicingrequires the existence of specific sequence arrangements atappropriate pre-mRNA sites (signals) and is affected by amassive ribonucleoprotein complex the spliceosome whichhas evolved to interact with these sequences Most oftenthe splicing signal is univocal and robust enough to allowone single splicing pattern only at each site But when thesplicing signals are less robust or possibly not univocalldquophysiologicalrdquo alternative splicing patterns may occur withtotal or partial deletion of some exons or retention of in-frame introns resulting in alterations of the encoded protein

product In most of these cases the generated protein eitherretains a similar function to that of the default protein ormay acquire a different biological function [1ndash3] In otherinstances ldquounsuitedrdquo mRNAs are prevented from crossingthe nuclear membrane a selection structure which emergedin eukaryotes to separate intron-containing RNAs from thetranslation apparatus In addition other cellular systemsmaydegrade irregularly spliced or mutated mRNAs (nonsense-mediated decay NMD) [4ndash6] or eventually altered proteinsmay be ubiquitinated and proteasome-degraded [7] How-ever in some cases despite all these control mechanismsan irregular splicing or a disruption of the physiologicalalternative splicing may impair cell functions and bringabout severe illnesses [8ndash10] Special strategies to allow someunspliced virus-derived mRNAs (required for the synthesisof some envelope and capsid proteins) to be exported out ofthe nucleus are adopted by human immunodeficiency virustype 1 [11]

The main intronic splicing signals are the 51015840 splice site(51015840ss) or splice donor site the branch site the polypyrimidine

2 International Journal of Evolutionary Biology

tract and the 31015840 splice site (31015840ss) or splice acceptor siteThe 51015840ss marks the 51015840 end (beginning) of the intron and inalmost 99 of cases begins with the ldquocanonicalrdquo dinucleotideGT followed by a few varying nucleotides The branch siteis a very short sequence including an adenine nucleotideThe polypyrimidine tract is a sim15-nucleotide sequence witha rich content of Cs and Ts The 31015840ss marks the 31015840 endof the intron and in almost 99 of cases it ends withthe ldquocanonicalrdquo dinucleotide AG which is preceded by afew varying nucleotides The polypyrimidine tract is locatedimmediately upstream of the 31015840ss and the branch site liesfurther upstream at a short distance In 1 of the intronsthe canonical 51015840ss-31015840ss combination GTndashAG is replaced bynoncanonical combinations such as GCndashAG or ATndashAC

Besides the above-mentioned main splicing signals itis believed that other sections of the intron may harboradditional ldquosplicing motifsrdquo which may contain consensussequences for specific nuclear proteins In addition bothintrons (I) and exons (E) possibly harbor splicing enhancer(ISE and ESE) and splicing silencer (ISS and ESS) motifsHowever these splicing motifs are neither specific norconstantly present and the ldquosplicing coderdquo may be context-dependent and result from an integration of several differentinputs [12ndash23]

The spliceosome is a massive ribonucleoprotein complexcomprising some small ldquoUrdquos nuclear RNAs (snRNAs) plusdifferent specific proteins these complexes being referred toas small nuclear ribonucleoproteins (snRNPs or SNURPs)In addition more than a hundred other protein factorscooperate in splicing Two types of spliceosomal intronshave been describedU2 snRNP-dependent introns (themaingroup) which use U1 U2 U4 U5 and U6 snRNPs (with threesubtypes according to the terminal dinucleotides GTndashAGGCndashAG and ATndashAC) and U12 snRNP-dependent intronswhich use U11 U12 U4atac U6atac and U5 snRNPs (withtwo subtypes according to the terminal dinucleotides ATndashAC and GTndashAG) [12ndash15 24ndash27]

Large-scale statistical analyses of the splice siteshave been made in model species of vertebratesinvertebrates fungi protozoa and plants [25ndash30] Compre-hensive databases have also been generated for examplehttpwwwsoftberrycomspldbSpliceDBhtml [29ndash31] and[26] Cumulatively the above-quoted papers report dataon the global nucleotide patterns at the splice sites in eachspecies that is the frequency of each base at a given positionwith reference to the intron boundaries (including theflanking sections of the neighboring exons) the informationcontent of the intronic sequences which are bound bythe splicing factors (as a measure of their conservation)and the evolution of the splicing factors in parallel withthe evolution of the intronic splicing signals We deemedthat a comparative analysis of the intronic splice signals inorthologous genes (ie genes which derived from a commonancestor and diverged following speciation events) at theindividual orthologous splice sites could throw further lighton the evolution of the splice sites during the process ofspeciation In some topographically corresponding splicesites of orthologous genes of Mouse and Human thesignaling sequences are identical whereas in others they

are different It is likely that at least in the great majority ofcases the unaltered sequences are derived from the commonancestor and were conserved during the process of divergentspeciation Thus comparative analysis of orthologous splicesites could indicate which specific signaling sequencestend to be more conserved and thence the direction of theselective pressure in the time interval from the beginningof the speciation process to the present day To this endwe considered a group of cytokine receptor genes whichare orthologous in Mouse and Human two species whichdiverged quite recently some 65ndash85 MYA (million years ago)[32 33] We also examined the Mouse and Human transcriptvariants in this group of receptors with the aim of spottingthose splicing signals which are more frequently silencedor replaced by other potential splicing signals at a differentposition in the gene

2 Materials and Methods

This study was made on a group of orthologous genesof Mouse and Human coding for cytokine receptors[34 35] (A database of homologous genes is availableat httpwwwncbinlmnihgovhomologene a databaseof orthologous genes is available at httpinparanoidsbcsusecgi-binindexcgi) A structural classification of cytokinereceptors (mainly according to Coico and Sunshine 2009)divides the receptors into the following groups (1)immunoglobulin superfamily receptors (2) class I cytokinereceptor family (3) class II cytokine receptor family(Interferon receptor family) (4) tumor necrosis factorreceptor superfamily (5) chemokine receptor family (6)transforming growth factor beta receptor family In thisstudy we analyzed 26 receptors belonging to all thesegroups as indicated in Table 1 Sequences were retrievedfrom the NCBI GenBank (httpwwwncbinlmnihgov)We selected the splice variants of these genes which showedsuperimposable exon-intron arrangements in Mouse andHuman and exons aligning with nucleotide identities of 70or higher Eventually 216 MouseHuman couples of intronswere extracted In addition we compared these canonicaltranscripts with the Mouse and Human transcript variants ofthe same genes

Nucleotide sequence alignments without gaps of eachcouple of orthologous introns were made using a programavailable at httpmultalintoulouseinrafrmultalin [36]

All probabilistic comparisons between percentages weremade using the binomial distribution (one-tailed tests) withthe original actual figures [37]

3 Results

31 Nucleotide Identities at Corresponding Positions at theEnds of Mouse and Human Introns Nucleotide identitieswere recorded in orderly manner in the first 50 nucleotidesof each MouseHuman couple of introns (Figure 1) All the216 couples of introns considered started with the canonicaldinucleotideGTand identitywas of course 100at positions1 and 2 Then identities averaged 843 903 935 and

International Journal of Evolutionary Biology 3

0

25

50

75

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Figure 1 Nucleotide identities () in the first 50 nucleotides of eachMouseHuman couple of orthologous introns indicating the probabilityof nucleotide conservation at any given position

0

25

50

75

100

50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Figure 2 Nucleotide identities () in the last 50 nucleotides of eachMouseHuman couple of orthologous introns indicating the probabilityof nucleotide conservation at any given position

736 at positions 3 4 5 and 6 respectively In positions 7 to50 the average identities were lower and remained relativelyconstant with a mean value of 554 (the horizontal line inFigure 1)

Nucleotide identities were also recorded in the last 50nucleotides (numbered from 50 to 1) of each MouseHumancouple of introns (Figure 2) All couples of introns endedwiththe canonical dinucleotide AG and thus identity was 100at positions 1 and 2 Then identities averaged 773 597727 and 722 at positions 3 4 5 and 6 respectively Inpositions 7 to 21 the average identities remained relativelyconstant with a mean value of 631 (the right horizontalline in Figure 2) In the last positions (22 to 50) the averageidentities remained relatively constant with a mean value of560 (the left horizontal line in Figure 2)

From these preliminary data as well as from otherliterature data [25] the initial and terminal hexanucleotides ofthe introns appeared to be the best characterized componentsof the 51015840ss and the 31015840ss respectively We thus undertook amore detailed analysis of these sequences

32 The Initial Hexanucleotides of the Introns The possiblenumber of hexanucleotides beginningwithGT is 44 = 256 Ofthese 51 (cumulatively in Mouse and Human) were found inour sample which does not exclude the possibility that otherhexanucleotides are used in other genes The more frequentinitial hexanucleotides are listed in Table 2 The per centincidences of these hexanucleotides inMouse andHuman donot differ significantly (119875 gt 005)

The nucleotide composition according to the position inthe initial hexanucleotides in Mouse and Human is shown in

Table 3 columns 1ndash4 The per cent incidences of the differentnucleotides in Mouse and Human at each position did notdiffer significantly (119875 gt 005) except in the case of the sixthnucleotide where C was significantly more represented inMouse as compared to Human

33 The Terminal Hexanucleotide of the Introns Of the 256possible hexanucleotides ending with AG 94 (cumulativelyin Mouse and Human) were found in our sample whichdoes not exclude the possibility that other hexanucleotidesare used in other genes The more represented terminalhexanucleotides are listed in Table 4 The per cent incidencesof these hexanucleotides in Mouse and Human did not differsignificantly (119875 gt 005)

The nucleotide composition according to the positionin the terminal hexanucleotides in Mouse and Human isshown in Table 5 columns 1ndash4The per cent incidences of thedifferent nucleotides in Mouse and Human at each positiondid not differ significantly (119875 gt 005) except in the case of thefirst nucleotide where G was significantly more representedin Mouse as compared to Human It is noteworthy that inour sample the nucleotide G was never present in the fourthposition so that no hexanucleotide ended by GAG

The limits of the polypyrimidine tract are not well definedand the two leftmost nucleotides (C- and T-rich positions 1and 2 Table 4) of the terminal hexanucleotide might in factbe the rightmost part of the polypyrimidine tract

34 The ldquoBifunctionalrdquo (Initial or Terminal) Hexanucleotidesof the Introns All the 16 (42) hexanucleotides beginningby GT and ending by AG may in theory act as both 51015840ss

4 International Journal of Evolutionary Biology

Table 1 The cytokine receptors analyzed in this study (1) is theimmunoglobulin superfamily receptors (2) is the class I cytokinereceptor family (3) is the class II cytokine receptor family (inter-feron receptor family) (4) is the tumor necrosis factor receptorsuperfamily (5) is the chemokine receptor family (6) is thetransforming growth factor beta receptor family

RECEPTOR (Receptor family) (Number of exons)Interleukin 1 receptor type I (IL1R1) (1) (10 exons)C-Kit Hardy-Zuckerman 4 Feline Sarcoma Viral OncogeneHomolog (KIT) (1) (21 exons)Interleukin 2 receptor alpha (IL2RA) (2) (8 exons)Interleukin 2 receptor beta (IL2RB) (2) (9 exons)Interleukin 2 receptor gamma (IL2RG) (2) (8 exons)Interleukin 4 receptor (IL4R) (2) (9 exons)Interleukin 13 receptor alpha 1 (IL13RA1) (2) (11 exons)Interleukin 13 receptor alpha 2 (IL13RA2) (2) (9 exons)Erythropoietin receptor (EPOR) (2) (8 exons)Colony stimulating factor 2 receptor beta low-affinity(granulocyte-macrophage) (CSF2RB) (2) (13 exons)Colony stimulating factor 3 receptor (granulocyte) (CSF3R) (2)(15 exons)Leukemia inhibitory factor receptor alpha (LIFR) (2) (19 exons)Prolactin receptor (PRLR) (2) (8 exons)Oncostatin M receptor (OSMR) (2) (17 exons)Interleukin 10 receptor alpha (IL10RA) (3) (7 exons)Interleukin 10 receptor beta (IL10RB) (3) (7 exons)Interferon (alpha beta and omega) receptor 1 (IFNAR1) (3) (11exons)CD27 molecule (CD27) (4) (6 exons)CD40 molecule TNF receptor superfamily member 5 (CD40) (4)(9 exons)Lymphotoxin beta receptor (TNFR superfamily member 3)(LTBR) (4) (10 exons)Chemokine (C-C motif) receptor 7 (CCR7) (5) (3 exons)Chemokine (C-C motif) receptor 9 (CCR9) (5) (2 exons)Chemokine (C-C motif) receptor 10 (CCR10) (5) (2 exons)Chemokine (C-X-C motif) receptor 4 (CXCR4) (5) (2 exons)Chemokine (C-X-C motif) receptor 5 (CXCR5) (5) (2 exons)Transforming growth factor beta receptor III (TGFBR3) (6) (16exons)

and 31015840ss (ldquobifunctionalrdquo hexanucleotides) Of these 8 neverappeared as initial or terminal hexanucleotides in our sampleand 3 (GTAAAG GTACAG and GTTAAG) appeared at thebeginning (but never at the end) of some introns while 5(GTCCAG GTGCAG GTGTAG GTTCAG and GTTTAG)appeared at the end (but never at the beginning) of otherintrons in both Mouse and Human

35 Conservation of the Individual Nucleotides in the Ini-tial Hexanucleotides of the Introns The structural conserva-tionmutation in the couples of MouseHuman orthologousinitial hexanucleotides was studied In more than half of

Table 2 The initial more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence inMouse

inHuman

Average()

GTGAGT 190 181 185GTAAGT 190 171 181GTAAGA 88 88 88GTGAGA 60 51 56GTGAGG 42 69 56GTAAGG 37 51 44GTAAGC 42 28 35Total 649 639 645

0

10

20

30

40

50

60

0 1 2 3 4

Figure 3Abscissa number of nucleotide differences (irrespective ofthe position) in orthologousMouseHuman initial hexanucleotidesOrdinate percentage

the couples (574) there was a complete identity betweenthe Mouse and Human orthologous hexanucleotides Adecreasing percentage of couples exhibited one two or threenucleotide differences (regardless of position) in no instanceall four variable nucleotides changed (Figure 3) The numberof changes per hexanucleotide averaged 058

We calculated the probability of random occurrenceof the same nucleotide at a given position in orthologousMouseHuman hexanucleotides (Table 3 column 5) Forinstance A in the third position is present with probability0597 in the Mouse and 0570 in the Human (Table 3GTAxxx columns 2 and 3) then the probability of randomoccurrence of A at that position in both Mouse and Humanis 0597 times 0570 = 03403 (or 3403) (binomial distribution119899 = 216) The actual occurrence () of nucleotide identitiesat each position is reported in column 6 of Table 3 Thepercentages of actual identities were always significantly (119875 lt001) higher than the percentages of random identities andthe difference expresses the level of nucleotide biologicalconservation In the nucleotides of the initial hexanucleotidesthe actual total conservation was about 53 higher than thecalculated purely random conservation

36 Conservation of Groups of Nucleotides in the InitialHexanucleotides of the Introns The contribution of groupsof bases in the initial hexanucleotides to enhancing or

International Journal of Evolutionary Biology 5

Table 3 Columns 1ndash4 the nucleotide composition according to the position in the initial hexanucleotides in Mouse and Human Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the initial hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()GTAxxx 597 575 3403 5093 119875 lt 001

GTCxxx 19 23 004 185 119875 lt 001

GTGxxx 370 384 1421 3009 119875 lt 001

GTTxxx 14 23 003 139 119875 lt 001

Total 4831 8426 119875 lt 001

GTxAxx 824 810 6674 7778 119875 lt 001

GTxCxx 28 28 008 231 119875 lt 001

GTxGxx 69 83 057 417 119875 lt 001

GTxTxx 79 79 062 602 119875 lt 001

Total 6801 9028 119875 lt 001

GTxxAx 79 74 058 648 119875 lt 001

GTxxCx 14 18 003 093 119875 lt 001

GTxxGx 852 843 7182 8241 119875 lt 001

GTxxTx 55 65 036 370 119875 lt 001

Total 7279 9352 119875 lt 001

GTxxxA 218 241 525 1620 119875 lt 001

GTxxxC 116 74 119875 = 002 086 370 119875 lt 001

GTxxxG 166 199 330 833 119875 lt 001

GTxxxT 500 486 2430 4537 119875 lt 001

Total 3371 736 119875 lt 001

Table 4 The terminal more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence in Mouse in Human Average ()TTTCAG 74 65 69CTGCAG 79 46 63TTTTAG 51 60 56TTGCAG 37 69 53TTCTAG 32 65 49CCACAG 46 32 39TTCCAG 51 28 39Total 370 365 368

reducing the level of MouseHuman conservation was thenstudied The random probability of occurrence of completehexanucleotides or groups of three or two nucleotides atgiven positions was calculated on the basis of the percentagesof the actual conservation of the individual componentnucleotides at the corresponding positions as reported inTable 3 (column 6) For example the random probability ofthe hexanucleotide GTAAGT is

10000 times 10000 times 05093 times 07778

times 08241 times 04537 = 01481 (or 1481)(1)

(binomial distribution 119899 = 216) and the random probabilityof the sequence GTAAGx is

10000 times 10000 times 05093 times 07778 times 08241

= 03265 (or 3265) (2)

The expected random probabilities were compared withthe actual frequencies Analysis of the initial hexanu-cleotides showed that the sequences GTGAGx GTGAxTand GTGxGT are significantly more expressed than could beexpected (Table 6) Analysis of all the hexanucleotides whichincluded these sequences demonstrated that in GTGAGT(which included all three overexpressed sequences) the over-expression was highly significant (expected 875 actual1389 119875 lt 001) overexpression of GTGAGG (whichincluded the GAG motif only) was less significant possiblydue to the lower frequency (expected 161 actual 324119875 = 006) None of the other hexanucleotides whichincluded some of the overexpressed sequences exhibitedsignificantly different expression levels from those expecteddue to the fact that they contained other not overexpressed orunderexpressed motifs

Sequences GTAAGx and GTAxGG were found to besignificantly less expressed than could be expected (Table 6)The hexanucleotide GTAAGG which included both under-expressed sequences was clearly underexpressed although ata low significance level due to the low frequency (expected272 actual 093 119875 = 007) All other hexanucleotideswhich included some of the underexpressed sequences didnot exhibit significantly different expression levels from thoseexpected

Another observation supports the role of the trinu-cleotides GAG and AAG in the association to overexpres-sion and underexpression respectively The hexanucleotidesGTGAGT and GTAAGT differ at the third nucleotide onlyAs at the third nucleotide the actual conservation of Ais 169 times the actual conservation of G (50933009

6 International Journal of Evolutionary Biology

Table 5 Columns 1ndash4 the nucleotide composition according to the position in the terminal hexanucleotides inMouse andHuman Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the terminal hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()AxxxAG 51 56 029 231 119875 lt 001

CxxxAG 352 338 119 2361 119875 lt 001

GxxxAG 83 46 119875 = 001 038 370 119875 lt 001

TxxxAG 514 560 2878 4259 119875 lt 001

Total 4135 7221 119875 lt 001

xAxxAG 79 93 073 509 119875 lt 001

xCxxAG 259 222 575 1389 119875 lt 001

xGxxAG 83 83 069 509 119875 lt 001

xTxxAG 579 602 3486 4861 119875 lt 001

Total 4203 7268 119875 lt 001

xxAxAG 273 269 734 1713 119875 lt 001

xxCxAG 273 291 794 1667 119875 lt 001

xxGxAG 199 213 424 926 119875 lt 001

xxTxAG 255 227 579 1667 119875 lt 001

Total 2531 5973 119875 lt 001

xxxAAG 74 70 052 602 119875 lt 001

xxxCAG 630 597 376 5046 119875 lt 001

xxxGAG 00 00 000 000xxxTAG 296 333 986 2083 119875 lt 001

Total 4798 7731 119875 lt 001

Table 6 Conservation of groups of nucleotides (underlined) in the initial hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

GTGAGx 1929 2546 617 002GTGAxT 1062 1435 373 005GTGxGT 1125 1528 403 004

GTAAGx 3265 2639 minus626 003GTAxGG 350 093 minus257 002

Table 3) GTAAGT could be expected to be more expressedthan GTGAGT by about 70 On the contrary GTGAGTwas actually more expressed than GTAAGT by about 2(185181 Table 2) and indeed it was the most frequentlyexpressed initial hexanucleotide Similarly GTAAGG shouldbe more expressed than GTGAGG by 70 but actuallyGTGAGG was more expressed than GTAAGG by about 27(5644 Table 2)

37 Conservation of the Individual Nucleotides in the TerminalHexanucleotides of the Introns Conservationmutation inthe couples of MouseHuman orthologous terminal hexanu-cleotides was also studied In 282 only of the couples wasthere a complete identity between the Mouse and Humanorthologous hexanucleotides In a higher percentage of cases(379) one nucleotide (regardless of position) was changedA decreasing percentage of couples exhibited two or threenucleotide differences and in only 09 of cases were allfour variable nucleotides changed (Figure 4) The number ofchanges per hexanucleotide averaged 117

The probability of random occurrence of the samenucleotide at a given position in orthologous MouseHumanhexanucleotides was calculated as previously reported(Table 5 column 5) The actual occurrence () of nucleotideidentities at each position is reported in column 6 of Table 5The percentages of actual identities were always significantly(119875 lt 001) higher than the percentages of random identitiesIn the nucleotides of the terminal hexanucleotides the actualtotal conservation was about 80 higher than the calculatedpurely random conservation

38 Conservation of Groups of Nucleotides in the TerminalHexanucleotides of the Introns The contribution of groupsof bases in the terminal hexanucleotides in favoring theMouseHuman conservation was studied by comparing theirexpected random probabilities with the observed actualfrequencies (see previous paragraph for the computing ofrandom probabilities and Table 5 column 6 for the actualconservation of the individual nucleotides at the differentpositions)

International Journal of Evolutionary Biology 7

Table 7 Conservation of groups of nucleotides (underlined) in the terminal hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

TTTxAG 345 plusmn 268 880 535 lt001TTxTAG 431 plusmn 298 926 495 lt001xTTTAG 169 plusmn 189 463 294 lt001TxTTAG 148 plusmn 177 417 269 lt001xTGCAG 227 plusmn 219 509 282 001

TTxxAG 2070 plusmn 595 2963 893 lt001xxTTAG 347 plusmn 269 648 301 002xTGxAG 450 plusmn 305 833 383 lt001

0

10

20

30

40

50

60

0 1 2 3 4

Figure 4 Abscissa number of nucleotide differences (irrespectiveof the position) in orthologous MouseHuman terminal hexanu-cleotides Ordinate percentage

Analysis of the terminal hexanucleotides showed thatthe nucleotide sequences TTTxAG TTxTAG xTTTAGTxTTAG and xTGCAG are significantly more expressedthan could be expected (Table 7) Furthermore some dinu-cleotides derived from the above sequences are themselvesoverexpressed TTxxAG (derived from TTTxAG and TTx-TAG) xxTTAG (derived from xTTTAG and TxTTAG) andxTGxAG (derived from xTGCAG) (Table 7)

Three hexanucleotides were found to be significantlyoverexpressed as compared to the random probabilityTTTCAG (expected 174 actual 463 119875 lt 001)TTTTAG (expected 072 actual 370 119875 lt 001) andCTGCAG (expected 054 actual 185 119875 = 003) Allthese hexanucleotides contained at least one of the conservedmotifs Other hexanucleotides although containing some ofthe overexpressed trinucleotides did not exhibit significantlydifferent conservation levels from those of the randomprobability and in no case did the occurrence of one of theconserved dinucleotides determine a higher conservation ofthe corresponding hexanucleotides

No motif associated with a significant underexpressionwas found

39 Splicing Variants Splicing variants of the individualgenes are usually of a different type in Mouse and Humanso the positions of the splicing sites no longer correspondin the two species (ie the orthology of the splicing sites is

lost) even though the main traits of the protein products arepreserved In only one instance in receptor KIT Mouse andHuman exhibited the same type of variant that is the use ofanother 51015840ss 12 nucleotides upstream of the canonical site

In several cases one entire exon present in the canonicalform is lacking in a variant although the other downstreamexons are regularly transcribed indicating that the constitu-tive 31015840ss at the end of the preceding intron did not operate Asan example in the transcript variant X1 (XM 005252446) ofthe Human IL2RA the canonical 31015840ss TTCCAG immediatelyupstream of exon 4 is silenced and the fourth exon (216 nt72 aa) of the canonical sequence (NM 000417) is lackingin this variant In other cases up to three consecutive 31015840sssare silenced in the Human IL2RG transcript variant X2(XM 005262262) 31015840sss ATCTAG CTCTAG and CTCCAGare all silenced with loss of exons 2 3 and 4 while exons5 6 7 and 8 are transcribed as in the canonical form(NM 000206) without alteration of the reading frame

In other cases the active alternative ss may be locatedupstream or downstream of the canonical ss resulting in analteration of the length of individual exons usually withoutframeshifts For example in theHuman variant X1 of CSF2RB(XM 005261340) the intron upstream of exon 7 ends byan ldquoearlyrdquo CCTCAG while the corresponding intron in thecanonical sequence (NM 007780) ends at another CCTCAG18 nt downstream resulting in an 18-nt longer exon in thevariant Although variations of the 31015840sss are more frequentthe 51015840sss may also be variable As an example in Humantranscript variant X1 of CD40 (XM 005260617) the 51015840ss afterexon 7 is GTGGGG replacing the GTGAGT of the canonicalvariant (NM 001250) which occurs 12 nt upstream

Detailed results are shown in the SupplementaryMaterialavailable at httpdxdoiorg1011552013818954

310 The Binding of the U1 snRNA We studied the ungappedbase-pairing between the U1 snRNA sequence ACTTAC(NCBI GenBank accession code NR 004430) conserved inMouse and Human and the last three nucleotides of theexons plus the corresponding 51015840ss hexanucleotide The 51015840ssGTAAGT which exactly base-pairs with the U1 is expressedin 190 of cases in Mouse and 171 of cases in Human(Table 2) In all other cases the match is not perfect Onaverage a complementary match between the U1 nucleotidesand the four variable nucleotides of the 51015840ss is achieved in

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

2 International Journal of Evolutionary Biology

tract and the 31015840 splice site (31015840ss) or splice acceptor siteThe 51015840ss marks the 51015840 end (beginning) of the intron and inalmost 99 of cases begins with the ldquocanonicalrdquo dinucleotideGT followed by a few varying nucleotides The branch siteis a very short sequence including an adenine nucleotideThe polypyrimidine tract is a sim15-nucleotide sequence witha rich content of Cs and Ts The 31015840ss marks the 31015840 endof the intron and in almost 99 of cases it ends withthe ldquocanonicalrdquo dinucleotide AG which is preceded by afew varying nucleotides The polypyrimidine tract is locatedimmediately upstream of the 31015840ss and the branch site liesfurther upstream at a short distance In 1 of the intronsthe canonical 51015840ss-31015840ss combination GTndashAG is replaced bynoncanonical combinations such as GCndashAG or ATndashAC

Besides the above-mentioned main splicing signals itis believed that other sections of the intron may harboradditional ldquosplicing motifsrdquo which may contain consensussequences for specific nuclear proteins In addition bothintrons (I) and exons (E) possibly harbor splicing enhancer(ISE and ESE) and splicing silencer (ISS and ESS) motifsHowever these splicing motifs are neither specific norconstantly present and the ldquosplicing coderdquo may be context-dependent and result from an integration of several differentinputs [12ndash23]

The spliceosome is a massive ribonucleoprotein complexcomprising some small ldquoUrdquos nuclear RNAs (snRNAs) plusdifferent specific proteins these complexes being referred toas small nuclear ribonucleoproteins (snRNPs or SNURPs)In addition more than a hundred other protein factorscooperate in splicing Two types of spliceosomal intronshave been describedU2 snRNP-dependent introns (themaingroup) which use U1 U2 U4 U5 and U6 snRNPs (with threesubtypes according to the terminal dinucleotides GTndashAGGCndashAG and ATndashAC) and U12 snRNP-dependent intronswhich use U11 U12 U4atac U6atac and U5 snRNPs (withtwo subtypes according to the terminal dinucleotides ATndashAC and GTndashAG) [12ndash15 24ndash27]

Large-scale statistical analyses of the splice siteshave been made in model species of vertebratesinvertebrates fungi protozoa and plants [25ndash30] Compre-hensive databases have also been generated for examplehttpwwwsoftberrycomspldbSpliceDBhtml [29ndash31] and[26] Cumulatively the above-quoted papers report dataon the global nucleotide patterns at the splice sites in eachspecies that is the frequency of each base at a given positionwith reference to the intron boundaries (including theflanking sections of the neighboring exons) the informationcontent of the intronic sequences which are bound bythe splicing factors (as a measure of their conservation)and the evolution of the splicing factors in parallel withthe evolution of the intronic splicing signals We deemedthat a comparative analysis of the intronic splice signals inorthologous genes (ie genes which derived from a commonancestor and diverged following speciation events) at theindividual orthologous splice sites could throw further lighton the evolution of the splice sites during the process ofspeciation In some topographically corresponding splicesites of orthologous genes of Mouse and Human thesignaling sequences are identical whereas in others they

are different It is likely that at least in the great majority ofcases the unaltered sequences are derived from the commonancestor and were conserved during the process of divergentspeciation Thus comparative analysis of orthologous splicesites could indicate which specific signaling sequencestend to be more conserved and thence the direction of theselective pressure in the time interval from the beginningof the speciation process to the present day To this endwe considered a group of cytokine receptor genes whichare orthologous in Mouse and Human two species whichdiverged quite recently some 65ndash85 MYA (million years ago)[32 33] We also examined the Mouse and Human transcriptvariants in this group of receptors with the aim of spottingthose splicing signals which are more frequently silencedor replaced by other potential splicing signals at a differentposition in the gene

2 Materials and Methods

This study was made on a group of orthologous genesof Mouse and Human coding for cytokine receptors[34 35] (A database of homologous genes is availableat httpwwwncbinlmnihgovhomologene a databaseof orthologous genes is available at httpinparanoidsbcsusecgi-binindexcgi) A structural classification of cytokinereceptors (mainly according to Coico and Sunshine 2009)divides the receptors into the following groups (1)immunoglobulin superfamily receptors (2) class I cytokinereceptor family (3) class II cytokine receptor family(Interferon receptor family) (4) tumor necrosis factorreceptor superfamily (5) chemokine receptor family (6)transforming growth factor beta receptor family In thisstudy we analyzed 26 receptors belonging to all thesegroups as indicated in Table 1 Sequences were retrievedfrom the NCBI GenBank (httpwwwncbinlmnihgov)We selected the splice variants of these genes which showedsuperimposable exon-intron arrangements in Mouse andHuman and exons aligning with nucleotide identities of 70or higher Eventually 216 MouseHuman couples of intronswere extracted In addition we compared these canonicaltranscripts with the Mouse and Human transcript variants ofthe same genes

Nucleotide sequence alignments without gaps of eachcouple of orthologous introns were made using a programavailable at httpmultalintoulouseinrafrmultalin [36]

All probabilistic comparisons between percentages weremade using the binomial distribution (one-tailed tests) withthe original actual figures [37]

3 Results

31 Nucleotide Identities at Corresponding Positions at theEnds of Mouse and Human Introns Nucleotide identitieswere recorded in orderly manner in the first 50 nucleotidesof each MouseHuman couple of introns (Figure 1) All the216 couples of introns considered started with the canonicaldinucleotideGTand identitywas of course 100at positions1 and 2 Then identities averaged 843 903 935 and

International Journal of Evolutionary Biology 3

0

25

50

75

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Figure 1 Nucleotide identities () in the first 50 nucleotides of eachMouseHuman couple of orthologous introns indicating the probabilityof nucleotide conservation at any given position

0

25

50

75

100

50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Figure 2 Nucleotide identities () in the last 50 nucleotides of eachMouseHuman couple of orthologous introns indicating the probabilityof nucleotide conservation at any given position

736 at positions 3 4 5 and 6 respectively In positions 7 to50 the average identities were lower and remained relativelyconstant with a mean value of 554 (the horizontal line inFigure 1)

Nucleotide identities were also recorded in the last 50nucleotides (numbered from 50 to 1) of each MouseHumancouple of introns (Figure 2) All couples of introns endedwiththe canonical dinucleotide AG and thus identity was 100at positions 1 and 2 Then identities averaged 773 597727 and 722 at positions 3 4 5 and 6 respectively Inpositions 7 to 21 the average identities remained relativelyconstant with a mean value of 631 (the right horizontalline in Figure 2) In the last positions (22 to 50) the averageidentities remained relatively constant with a mean value of560 (the left horizontal line in Figure 2)

From these preliminary data as well as from otherliterature data [25] the initial and terminal hexanucleotides ofthe introns appeared to be the best characterized componentsof the 51015840ss and the 31015840ss respectively We thus undertook amore detailed analysis of these sequences

32 The Initial Hexanucleotides of the Introns The possiblenumber of hexanucleotides beginningwithGT is 44 = 256 Ofthese 51 (cumulatively in Mouse and Human) were found inour sample which does not exclude the possibility that otherhexanucleotides are used in other genes The more frequentinitial hexanucleotides are listed in Table 2 The per centincidences of these hexanucleotides inMouse andHuman donot differ significantly (119875 gt 005)

The nucleotide composition according to the position inthe initial hexanucleotides in Mouse and Human is shown in

Table 3 columns 1ndash4 The per cent incidences of the differentnucleotides in Mouse and Human at each position did notdiffer significantly (119875 gt 005) except in the case of the sixthnucleotide where C was significantly more represented inMouse as compared to Human

33 The Terminal Hexanucleotide of the Introns Of the 256possible hexanucleotides ending with AG 94 (cumulativelyin Mouse and Human) were found in our sample whichdoes not exclude the possibility that other hexanucleotidesare used in other genes The more represented terminalhexanucleotides are listed in Table 4 The per cent incidencesof these hexanucleotides in Mouse and Human did not differsignificantly (119875 gt 005)

The nucleotide composition according to the positionin the terminal hexanucleotides in Mouse and Human isshown in Table 5 columns 1ndash4The per cent incidences of thedifferent nucleotides in Mouse and Human at each positiondid not differ significantly (119875 gt 005) except in the case of thefirst nucleotide where G was significantly more representedin Mouse as compared to Human It is noteworthy that inour sample the nucleotide G was never present in the fourthposition so that no hexanucleotide ended by GAG

The limits of the polypyrimidine tract are not well definedand the two leftmost nucleotides (C- and T-rich positions 1and 2 Table 4) of the terminal hexanucleotide might in factbe the rightmost part of the polypyrimidine tract

34 The ldquoBifunctionalrdquo (Initial or Terminal) Hexanucleotidesof the Introns All the 16 (42) hexanucleotides beginningby GT and ending by AG may in theory act as both 51015840ss

4 International Journal of Evolutionary Biology

Table 1 The cytokine receptors analyzed in this study (1) is theimmunoglobulin superfamily receptors (2) is the class I cytokinereceptor family (3) is the class II cytokine receptor family (inter-feron receptor family) (4) is the tumor necrosis factor receptorsuperfamily (5) is the chemokine receptor family (6) is thetransforming growth factor beta receptor family

RECEPTOR (Receptor family) (Number of exons)Interleukin 1 receptor type I (IL1R1) (1) (10 exons)C-Kit Hardy-Zuckerman 4 Feline Sarcoma Viral OncogeneHomolog (KIT) (1) (21 exons)Interleukin 2 receptor alpha (IL2RA) (2) (8 exons)Interleukin 2 receptor beta (IL2RB) (2) (9 exons)Interleukin 2 receptor gamma (IL2RG) (2) (8 exons)Interleukin 4 receptor (IL4R) (2) (9 exons)Interleukin 13 receptor alpha 1 (IL13RA1) (2) (11 exons)Interleukin 13 receptor alpha 2 (IL13RA2) (2) (9 exons)Erythropoietin receptor (EPOR) (2) (8 exons)Colony stimulating factor 2 receptor beta low-affinity(granulocyte-macrophage) (CSF2RB) (2) (13 exons)Colony stimulating factor 3 receptor (granulocyte) (CSF3R) (2)(15 exons)Leukemia inhibitory factor receptor alpha (LIFR) (2) (19 exons)Prolactin receptor (PRLR) (2) (8 exons)Oncostatin M receptor (OSMR) (2) (17 exons)Interleukin 10 receptor alpha (IL10RA) (3) (7 exons)Interleukin 10 receptor beta (IL10RB) (3) (7 exons)Interferon (alpha beta and omega) receptor 1 (IFNAR1) (3) (11exons)CD27 molecule (CD27) (4) (6 exons)CD40 molecule TNF receptor superfamily member 5 (CD40) (4)(9 exons)Lymphotoxin beta receptor (TNFR superfamily member 3)(LTBR) (4) (10 exons)Chemokine (C-C motif) receptor 7 (CCR7) (5) (3 exons)Chemokine (C-C motif) receptor 9 (CCR9) (5) (2 exons)Chemokine (C-C motif) receptor 10 (CCR10) (5) (2 exons)Chemokine (C-X-C motif) receptor 4 (CXCR4) (5) (2 exons)Chemokine (C-X-C motif) receptor 5 (CXCR5) (5) (2 exons)Transforming growth factor beta receptor III (TGFBR3) (6) (16exons)

and 31015840ss (ldquobifunctionalrdquo hexanucleotides) Of these 8 neverappeared as initial or terminal hexanucleotides in our sampleand 3 (GTAAAG GTACAG and GTTAAG) appeared at thebeginning (but never at the end) of some introns while 5(GTCCAG GTGCAG GTGTAG GTTCAG and GTTTAG)appeared at the end (but never at the beginning) of otherintrons in both Mouse and Human

35 Conservation of the Individual Nucleotides in the Ini-tial Hexanucleotides of the Introns The structural conserva-tionmutation in the couples of MouseHuman orthologousinitial hexanucleotides was studied In more than half of

Table 2 The initial more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence inMouse

inHuman

Average()

GTGAGT 190 181 185GTAAGT 190 171 181GTAAGA 88 88 88GTGAGA 60 51 56GTGAGG 42 69 56GTAAGG 37 51 44GTAAGC 42 28 35Total 649 639 645

0

10

20

30

40

50

60

0 1 2 3 4

Figure 3Abscissa number of nucleotide differences (irrespective ofthe position) in orthologousMouseHuman initial hexanucleotidesOrdinate percentage

the couples (574) there was a complete identity betweenthe Mouse and Human orthologous hexanucleotides Adecreasing percentage of couples exhibited one two or threenucleotide differences (regardless of position) in no instanceall four variable nucleotides changed (Figure 3) The numberof changes per hexanucleotide averaged 058

We calculated the probability of random occurrenceof the same nucleotide at a given position in orthologousMouseHuman hexanucleotides (Table 3 column 5) Forinstance A in the third position is present with probability0597 in the Mouse and 0570 in the Human (Table 3GTAxxx columns 2 and 3) then the probability of randomoccurrence of A at that position in both Mouse and Humanis 0597 times 0570 = 03403 (or 3403) (binomial distribution119899 = 216) The actual occurrence () of nucleotide identitiesat each position is reported in column 6 of Table 3 Thepercentages of actual identities were always significantly (119875 lt001) higher than the percentages of random identities andthe difference expresses the level of nucleotide biologicalconservation In the nucleotides of the initial hexanucleotidesthe actual total conservation was about 53 higher than thecalculated purely random conservation

36 Conservation of Groups of Nucleotides in the InitialHexanucleotides of the Introns The contribution of groupsof bases in the initial hexanucleotides to enhancing or

International Journal of Evolutionary Biology 5

Table 3 Columns 1ndash4 the nucleotide composition according to the position in the initial hexanucleotides in Mouse and Human Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the initial hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()GTAxxx 597 575 3403 5093 119875 lt 001

GTCxxx 19 23 004 185 119875 lt 001

GTGxxx 370 384 1421 3009 119875 lt 001

GTTxxx 14 23 003 139 119875 lt 001

Total 4831 8426 119875 lt 001

GTxAxx 824 810 6674 7778 119875 lt 001

GTxCxx 28 28 008 231 119875 lt 001

GTxGxx 69 83 057 417 119875 lt 001

GTxTxx 79 79 062 602 119875 lt 001

Total 6801 9028 119875 lt 001

GTxxAx 79 74 058 648 119875 lt 001

GTxxCx 14 18 003 093 119875 lt 001

GTxxGx 852 843 7182 8241 119875 lt 001

GTxxTx 55 65 036 370 119875 lt 001

Total 7279 9352 119875 lt 001

GTxxxA 218 241 525 1620 119875 lt 001

GTxxxC 116 74 119875 = 002 086 370 119875 lt 001

GTxxxG 166 199 330 833 119875 lt 001

GTxxxT 500 486 2430 4537 119875 lt 001

Total 3371 736 119875 lt 001

Table 4 The terminal more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence in Mouse in Human Average ()TTTCAG 74 65 69CTGCAG 79 46 63TTTTAG 51 60 56TTGCAG 37 69 53TTCTAG 32 65 49CCACAG 46 32 39TTCCAG 51 28 39Total 370 365 368

reducing the level of MouseHuman conservation was thenstudied The random probability of occurrence of completehexanucleotides or groups of three or two nucleotides atgiven positions was calculated on the basis of the percentagesof the actual conservation of the individual componentnucleotides at the corresponding positions as reported inTable 3 (column 6) For example the random probability ofthe hexanucleotide GTAAGT is

10000 times 10000 times 05093 times 07778

times 08241 times 04537 = 01481 (or 1481)(1)

(binomial distribution 119899 = 216) and the random probabilityof the sequence GTAAGx is

10000 times 10000 times 05093 times 07778 times 08241

= 03265 (or 3265) (2)

The expected random probabilities were compared withthe actual frequencies Analysis of the initial hexanu-cleotides showed that the sequences GTGAGx GTGAxTand GTGxGT are significantly more expressed than could beexpected (Table 6) Analysis of all the hexanucleotides whichincluded these sequences demonstrated that in GTGAGT(which included all three overexpressed sequences) the over-expression was highly significant (expected 875 actual1389 119875 lt 001) overexpression of GTGAGG (whichincluded the GAG motif only) was less significant possiblydue to the lower frequency (expected 161 actual 324119875 = 006) None of the other hexanucleotides whichincluded some of the overexpressed sequences exhibitedsignificantly different expression levels from those expecteddue to the fact that they contained other not overexpressed orunderexpressed motifs

Sequences GTAAGx and GTAxGG were found to besignificantly less expressed than could be expected (Table 6)The hexanucleotide GTAAGG which included both under-expressed sequences was clearly underexpressed although ata low significance level due to the low frequency (expected272 actual 093 119875 = 007) All other hexanucleotideswhich included some of the underexpressed sequences didnot exhibit significantly different expression levels from thoseexpected

Another observation supports the role of the trinu-cleotides GAG and AAG in the association to overexpres-sion and underexpression respectively The hexanucleotidesGTGAGT and GTAAGT differ at the third nucleotide onlyAs at the third nucleotide the actual conservation of Ais 169 times the actual conservation of G (50933009

6 International Journal of Evolutionary Biology

Table 5 Columns 1ndash4 the nucleotide composition according to the position in the terminal hexanucleotides inMouse andHuman Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the terminal hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()AxxxAG 51 56 029 231 119875 lt 001

CxxxAG 352 338 119 2361 119875 lt 001

GxxxAG 83 46 119875 = 001 038 370 119875 lt 001

TxxxAG 514 560 2878 4259 119875 lt 001

Total 4135 7221 119875 lt 001

xAxxAG 79 93 073 509 119875 lt 001

xCxxAG 259 222 575 1389 119875 lt 001

xGxxAG 83 83 069 509 119875 lt 001

xTxxAG 579 602 3486 4861 119875 lt 001

Total 4203 7268 119875 lt 001

xxAxAG 273 269 734 1713 119875 lt 001

xxCxAG 273 291 794 1667 119875 lt 001

xxGxAG 199 213 424 926 119875 lt 001

xxTxAG 255 227 579 1667 119875 lt 001

Total 2531 5973 119875 lt 001

xxxAAG 74 70 052 602 119875 lt 001

xxxCAG 630 597 376 5046 119875 lt 001

xxxGAG 00 00 000 000xxxTAG 296 333 986 2083 119875 lt 001

Total 4798 7731 119875 lt 001

Table 6 Conservation of groups of nucleotides (underlined) in the initial hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

GTGAGx 1929 2546 617 002GTGAxT 1062 1435 373 005GTGxGT 1125 1528 403 004

GTAAGx 3265 2639 minus626 003GTAxGG 350 093 minus257 002

Table 3) GTAAGT could be expected to be more expressedthan GTGAGT by about 70 On the contrary GTGAGTwas actually more expressed than GTAAGT by about 2(185181 Table 2) and indeed it was the most frequentlyexpressed initial hexanucleotide Similarly GTAAGG shouldbe more expressed than GTGAGG by 70 but actuallyGTGAGG was more expressed than GTAAGG by about 27(5644 Table 2)

37 Conservation of the Individual Nucleotides in the TerminalHexanucleotides of the Introns Conservationmutation inthe couples of MouseHuman orthologous terminal hexanu-cleotides was also studied In 282 only of the couples wasthere a complete identity between the Mouse and Humanorthologous hexanucleotides In a higher percentage of cases(379) one nucleotide (regardless of position) was changedA decreasing percentage of couples exhibited two or threenucleotide differences and in only 09 of cases were allfour variable nucleotides changed (Figure 4) The number ofchanges per hexanucleotide averaged 117

The probability of random occurrence of the samenucleotide at a given position in orthologous MouseHumanhexanucleotides was calculated as previously reported(Table 5 column 5) The actual occurrence () of nucleotideidentities at each position is reported in column 6 of Table 5The percentages of actual identities were always significantly(119875 lt 001) higher than the percentages of random identitiesIn the nucleotides of the terminal hexanucleotides the actualtotal conservation was about 80 higher than the calculatedpurely random conservation

38 Conservation of Groups of Nucleotides in the TerminalHexanucleotides of the Introns The contribution of groupsof bases in the terminal hexanucleotides in favoring theMouseHuman conservation was studied by comparing theirexpected random probabilities with the observed actualfrequencies (see previous paragraph for the computing ofrandom probabilities and Table 5 column 6 for the actualconservation of the individual nucleotides at the differentpositions)

International Journal of Evolutionary Biology 7

Table 7 Conservation of groups of nucleotides (underlined) in the terminal hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

TTTxAG 345 plusmn 268 880 535 lt001TTxTAG 431 plusmn 298 926 495 lt001xTTTAG 169 plusmn 189 463 294 lt001TxTTAG 148 plusmn 177 417 269 lt001xTGCAG 227 plusmn 219 509 282 001

TTxxAG 2070 plusmn 595 2963 893 lt001xxTTAG 347 plusmn 269 648 301 002xTGxAG 450 plusmn 305 833 383 lt001

0

10

20

30

40

50

60

0 1 2 3 4

Figure 4 Abscissa number of nucleotide differences (irrespectiveof the position) in orthologous MouseHuman terminal hexanu-cleotides Ordinate percentage

Analysis of the terminal hexanucleotides showed thatthe nucleotide sequences TTTxAG TTxTAG xTTTAGTxTTAG and xTGCAG are significantly more expressedthan could be expected (Table 7) Furthermore some dinu-cleotides derived from the above sequences are themselvesoverexpressed TTxxAG (derived from TTTxAG and TTx-TAG) xxTTAG (derived from xTTTAG and TxTTAG) andxTGxAG (derived from xTGCAG) (Table 7)

Three hexanucleotides were found to be significantlyoverexpressed as compared to the random probabilityTTTCAG (expected 174 actual 463 119875 lt 001)TTTTAG (expected 072 actual 370 119875 lt 001) andCTGCAG (expected 054 actual 185 119875 = 003) Allthese hexanucleotides contained at least one of the conservedmotifs Other hexanucleotides although containing some ofthe overexpressed trinucleotides did not exhibit significantlydifferent conservation levels from those of the randomprobability and in no case did the occurrence of one of theconserved dinucleotides determine a higher conservation ofthe corresponding hexanucleotides

No motif associated with a significant underexpressionwas found

39 Splicing Variants Splicing variants of the individualgenes are usually of a different type in Mouse and Humanso the positions of the splicing sites no longer correspondin the two species (ie the orthology of the splicing sites is

lost) even though the main traits of the protein products arepreserved In only one instance in receptor KIT Mouse andHuman exhibited the same type of variant that is the use ofanother 51015840ss 12 nucleotides upstream of the canonical site

In several cases one entire exon present in the canonicalform is lacking in a variant although the other downstreamexons are regularly transcribed indicating that the constitu-tive 31015840ss at the end of the preceding intron did not operate Asan example in the transcript variant X1 (XM 005252446) ofthe Human IL2RA the canonical 31015840ss TTCCAG immediatelyupstream of exon 4 is silenced and the fourth exon (216 nt72 aa) of the canonical sequence (NM 000417) is lackingin this variant In other cases up to three consecutive 31015840sssare silenced in the Human IL2RG transcript variant X2(XM 005262262) 31015840sss ATCTAG CTCTAG and CTCCAGare all silenced with loss of exons 2 3 and 4 while exons5 6 7 and 8 are transcribed as in the canonical form(NM 000206) without alteration of the reading frame

In other cases the active alternative ss may be locatedupstream or downstream of the canonical ss resulting in analteration of the length of individual exons usually withoutframeshifts For example in theHuman variant X1 of CSF2RB(XM 005261340) the intron upstream of exon 7 ends byan ldquoearlyrdquo CCTCAG while the corresponding intron in thecanonical sequence (NM 007780) ends at another CCTCAG18 nt downstream resulting in an 18-nt longer exon in thevariant Although variations of the 31015840sss are more frequentthe 51015840sss may also be variable As an example in Humantranscript variant X1 of CD40 (XM 005260617) the 51015840ss afterexon 7 is GTGGGG replacing the GTGAGT of the canonicalvariant (NM 001250) which occurs 12 nt upstream

Detailed results are shown in the SupplementaryMaterialavailable at httpdxdoiorg1011552013818954

310 The Binding of the U1 snRNA We studied the ungappedbase-pairing between the U1 snRNA sequence ACTTAC(NCBI GenBank accession code NR 004430) conserved inMouse and Human and the last three nucleotides of theexons plus the corresponding 51015840ss hexanucleotide The 51015840ssGTAAGT which exactly base-pairs with the U1 is expressedin 190 of cases in Mouse and 171 of cases in Human(Table 2) In all other cases the match is not perfect Onaverage a complementary match between the U1 nucleotidesand the four variable nucleotides of the 51015840ss is achieved in

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

International Journal of Evolutionary Biology 3

0

25

50

75

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Figure 1 Nucleotide identities () in the first 50 nucleotides of eachMouseHuman couple of orthologous introns indicating the probabilityof nucleotide conservation at any given position

0

25

50

75

100

50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Figure 2 Nucleotide identities () in the last 50 nucleotides of eachMouseHuman couple of orthologous introns indicating the probabilityof nucleotide conservation at any given position

736 at positions 3 4 5 and 6 respectively In positions 7 to50 the average identities were lower and remained relativelyconstant with a mean value of 554 (the horizontal line inFigure 1)

Nucleotide identities were also recorded in the last 50nucleotides (numbered from 50 to 1) of each MouseHumancouple of introns (Figure 2) All couples of introns endedwiththe canonical dinucleotide AG and thus identity was 100at positions 1 and 2 Then identities averaged 773 597727 and 722 at positions 3 4 5 and 6 respectively Inpositions 7 to 21 the average identities remained relativelyconstant with a mean value of 631 (the right horizontalline in Figure 2) In the last positions (22 to 50) the averageidentities remained relatively constant with a mean value of560 (the left horizontal line in Figure 2)

From these preliminary data as well as from otherliterature data [25] the initial and terminal hexanucleotides ofthe introns appeared to be the best characterized componentsof the 51015840ss and the 31015840ss respectively We thus undertook amore detailed analysis of these sequences

32 The Initial Hexanucleotides of the Introns The possiblenumber of hexanucleotides beginningwithGT is 44 = 256 Ofthese 51 (cumulatively in Mouse and Human) were found inour sample which does not exclude the possibility that otherhexanucleotides are used in other genes The more frequentinitial hexanucleotides are listed in Table 2 The per centincidences of these hexanucleotides inMouse andHuman donot differ significantly (119875 gt 005)

The nucleotide composition according to the position inthe initial hexanucleotides in Mouse and Human is shown in

Table 3 columns 1ndash4 The per cent incidences of the differentnucleotides in Mouse and Human at each position did notdiffer significantly (119875 gt 005) except in the case of the sixthnucleotide where C was significantly more represented inMouse as compared to Human

33 The Terminal Hexanucleotide of the Introns Of the 256possible hexanucleotides ending with AG 94 (cumulativelyin Mouse and Human) were found in our sample whichdoes not exclude the possibility that other hexanucleotidesare used in other genes The more represented terminalhexanucleotides are listed in Table 4 The per cent incidencesof these hexanucleotides in Mouse and Human did not differsignificantly (119875 gt 005)

The nucleotide composition according to the positionin the terminal hexanucleotides in Mouse and Human isshown in Table 5 columns 1ndash4The per cent incidences of thedifferent nucleotides in Mouse and Human at each positiondid not differ significantly (119875 gt 005) except in the case of thefirst nucleotide where G was significantly more representedin Mouse as compared to Human It is noteworthy that inour sample the nucleotide G was never present in the fourthposition so that no hexanucleotide ended by GAG

The limits of the polypyrimidine tract are not well definedand the two leftmost nucleotides (C- and T-rich positions 1and 2 Table 4) of the terminal hexanucleotide might in factbe the rightmost part of the polypyrimidine tract

34 The ldquoBifunctionalrdquo (Initial or Terminal) Hexanucleotidesof the Introns All the 16 (42) hexanucleotides beginningby GT and ending by AG may in theory act as both 51015840ss

4 International Journal of Evolutionary Biology

Table 1 The cytokine receptors analyzed in this study (1) is theimmunoglobulin superfamily receptors (2) is the class I cytokinereceptor family (3) is the class II cytokine receptor family (inter-feron receptor family) (4) is the tumor necrosis factor receptorsuperfamily (5) is the chemokine receptor family (6) is thetransforming growth factor beta receptor family

RECEPTOR (Receptor family) (Number of exons)Interleukin 1 receptor type I (IL1R1) (1) (10 exons)C-Kit Hardy-Zuckerman 4 Feline Sarcoma Viral OncogeneHomolog (KIT) (1) (21 exons)Interleukin 2 receptor alpha (IL2RA) (2) (8 exons)Interleukin 2 receptor beta (IL2RB) (2) (9 exons)Interleukin 2 receptor gamma (IL2RG) (2) (8 exons)Interleukin 4 receptor (IL4R) (2) (9 exons)Interleukin 13 receptor alpha 1 (IL13RA1) (2) (11 exons)Interleukin 13 receptor alpha 2 (IL13RA2) (2) (9 exons)Erythropoietin receptor (EPOR) (2) (8 exons)Colony stimulating factor 2 receptor beta low-affinity(granulocyte-macrophage) (CSF2RB) (2) (13 exons)Colony stimulating factor 3 receptor (granulocyte) (CSF3R) (2)(15 exons)Leukemia inhibitory factor receptor alpha (LIFR) (2) (19 exons)Prolactin receptor (PRLR) (2) (8 exons)Oncostatin M receptor (OSMR) (2) (17 exons)Interleukin 10 receptor alpha (IL10RA) (3) (7 exons)Interleukin 10 receptor beta (IL10RB) (3) (7 exons)Interferon (alpha beta and omega) receptor 1 (IFNAR1) (3) (11exons)CD27 molecule (CD27) (4) (6 exons)CD40 molecule TNF receptor superfamily member 5 (CD40) (4)(9 exons)Lymphotoxin beta receptor (TNFR superfamily member 3)(LTBR) (4) (10 exons)Chemokine (C-C motif) receptor 7 (CCR7) (5) (3 exons)Chemokine (C-C motif) receptor 9 (CCR9) (5) (2 exons)Chemokine (C-C motif) receptor 10 (CCR10) (5) (2 exons)Chemokine (C-X-C motif) receptor 4 (CXCR4) (5) (2 exons)Chemokine (C-X-C motif) receptor 5 (CXCR5) (5) (2 exons)Transforming growth factor beta receptor III (TGFBR3) (6) (16exons)

and 31015840ss (ldquobifunctionalrdquo hexanucleotides) Of these 8 neverappeared as initial or terminal hexanucleotides in our sampleand 3 (GTAAAG GTACAG and GTTAAG) appeared at thebeginning (but never at the end) of some introns while 5(GTCCAG GTGCAG GTGTAG GTTCAG and GTTTAG)appeared at the end (but never at the beginning) of otherintrons in both Mouse and Human

35 Conservation of the Individual Nucleotides in the Ini-tial Hexanucleotides of the Introns The structural conserva-tionmutation in the couples of MouseHuman orthologousinitial hexanucleotides was studied In more than half of

Table 2 The initial more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence inMouse

inHuman

Average()

GTGAGT 190 181 185GTAAGT 190 171 181GTAAGA 88 88 88GTGAGA 60 51 56GTGAGG 42 69 56GTAAGG 37 51 44GTAAGC 42 28 35Total 649 639 645

0

10

20

30

40

50

60

0 1 2 3 4

Figure 3Abscissa number of nucleotide differences (irrespective ofthe position) in orthologousMouseHuman initial hexanucleotidesOrdinate percentage

the couples (574) there was a complete identity betweenthe Mouse and Human orthologous hexanucleotides Adecreasing percentage of couples exhibited one two or threenucleotide differences (regardless of position) in no instanceall four variable nucleotides changed (Figure 3) The numberof changes per hexanucleotide averaged 058

We calculated the probability of random occurrenceof the same nucleotide at a given position in orthologousMouseHuman hexanucleotides (Table 3 column 5) Forinstance A in the third position is present with probability0597 in the Mouse and 0570 in the Human (Table 3GTAxxx columns 2 and 3) then the probability of randomoccurrence of A at that position in both Mouse and Humanis 0597 times 0570 = 03403 (or 3403) (binomial distribution119899 = 216) The actual occurrence () of nucleotide identitiesat each position is reported in column 6 of Table 3 Thepercentages of actual identities were always significantly (119875 lt001) higher than the percentages of random identities andthe difference expresses the level of nucleotide biologicalconservation In the nucleotides of the initial hexanucleotidesthe actual total conservation was about 53 higher than thecalculated purely random conservation

36 Conservation of Groups of Nucleotides in the InitialHexanucleotides of the Introns The contribution of groupsof bases in the initial hexanucleotides to enhancing or

International Journal of Evolutionary Biology 5

Table 3 Columns 1ndash4 the nucleotide composition according to the position in the initial hexanucleotides in Mouse and Human Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the initial hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()GTAxxx 597 575 3403 5093 119875 lt 001

GTCxxx 19 23 004 185 119875 lt 001

GTGxxx 370 384 1421 3009 119875 lt 001

GTTxxx 14 23 003 139 119875 lt 001

Total 4831 8426 119875 lt 001

GTxAxx 824 810 6674 7778 119875 lt 001

GTxCxx 28 28 008 231 119875 lt 001

GTxGxx 69 83 057 417 119875 lt 001

GTxTxx 79 79 062 602 119875 lt 001

Total 6801 9028 119875 lt 001

GTxxAx 79 74 058 648 119875 lt 001

GTxxCx 14 18 003 093 119875 lt 001

GTxxGx 852 843 7182 8241 119875 lt 001

GTxxTx 55 65 036 370 119875 lt 001

Total 7279 9352 119875 lt 001

GTxxxA 218 241 525 1620 119875 lt 001

GTxxxC 116 74 119875 = 002 086 370 119875 lt 001

GTxxxG 166 199 330 833 119875 lt 001

GTxxxT 500 486 2430 4537 119875 lt 001

Total 3371 736 119875 lt 001

Table 4 The terminal more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence in Mouse in Human Average ()TTTCAG 74 65 69CTGCAG 79 46 63TTTTAG 51 60 56TTGCAG 37 69 53TTCTAG 32 65 49CCACAG 46 32 39TTCCAG 51 28 39Total 370 365 368

reducing the level of MouseHuman conservation was thenstudied The random probability of occurrence of completehexanucleotides or groups of three or two nucleotides atgiven positions was calculated on the basis of the percentagesof the actual conservation of the individual componentnucleotides at the corresponding positions as reported inTable 3 (column 6) For example the random probability ofthe hexanucleotide GTAAGT is

10000 times 10000 times 05093 times 07778

times 08241 times 04537 = 01481 (or 1481)(1)

(binomial distribution 119899 = 216) and the random probabilityof the sequence GTAAGx is

10000 times 10000 times 05093 times 07778 times 08241

= 03265 (or 3265) (2)

The expected random probabilities were compared withthe actual frequencies Analysis of the initial hexanu-cleotides showed that the sequences GTGAGx GTGAxTand GTGxGT are significantly more expressed than could beexpected (Table 6) Analysis of all the hexanucleotides whichincluded these sequences demonstrated that in GTGAGT(which included all three overexpressed sequences) the over-expression was highly significant (expected 875 actual1389 119875 lt 001) overexpression of GTGAGG (whichincluded the GAG motif only) was less significant possiblydue to the lower frequency (expected 161 actual 324119875 = 006) None of the other hexanucleotides whichincluded some of the overexpressed sequences exhibitedsignificantly different expression levels from those expecteddue to the fact that they contained other not overexpressed orunderexpressed motifs

Sequences GTAAGx and GTAxGG were found to besignificantly less expressed than could be expected (Table 6)The hexanucleotide GTAAGG which included both under-expressed sequences was clearly underexpressed although ata low significance level due to the low frequency (expected272 actual 093 119875 = 007) All other hexanucleotideswhich included some of the underexpressed sequences didnot exhibit significantly different expression levels from thoseexpected

Another observation supports the role of the trinu-cleotides GAG and AAG in the association to overexpres-sion and underexpression respectively The hexanucleotidesGTGAGT and GTAAGT differ at the third nucleotide onlyAs at the third nucleotide the actual conservation of Ais 169 times the actual conservation of G (50933009

6 International Journal of Evolutionary Biology

Table 5 Columns 1ndash4 the nucleotide composition according to the position in the terminal hexanucleotides inMouse andHuman Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the terminal hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()AxxxAG 51 56 029 231 119875 lt 001

CxxxAG 352 338 119 2361 119875 lt 001

GxxxAG 83 46 119875 = 001 038 370 119875 lt 001

TxxxAG 514 560 2878 4259 119875 lt 001

Total 4135 7221 119875 lt 001

xAxxAG 79 93 073 509 119875 lt 001

xCxxAG 259 222 575 1389 119875 lt 001

xGxxAG 83 83 069 509 119875 lt 001

xTxxAG 579 602 3486 4861 119875 lt 001

Total 4203 7268 119875 lt 001

xxAxAG 273 269 734 1713 119875 lt 001

xxCxAG 273 291 794 1667 119875 lt 001

xxGxAG 199 213 424 926 119875 lt 001

xxTxAG 255 227 579 1667 119875 lt 001

Total 2531 5973 119875 lt 001

xxxAAG 74 70 052 602 119875 lt 001

xxxCAG 630 597 376 5046 119875 lt 001

xxxGAG 00 00 000 000xxxTAG 296 333 986 2083 119875 lt 001

Total 4798 7731 119875 lt 001

Table 6 Conservation of groups of nucleotides (underlined) in the initial hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

GTGAGx 1929 2546 617 002GTGAxT 1062 1435 373 005GTGxGT 1125 1528 403 004

GTAAGx 3265 2639 minus626 003GTAxGG 350 093 minus257 002

Table 3) GTAAGT could be expected to be more expressedthan GTGAGT by about 70 On the contrary GTGAGTwas actually more expressed than GTAAGT by about 2(185181 Table 2) and indeed it was the most frequentlyexpressed initial hexanucleotide Similarly GTAAGG shouldbe more expressed than GTGAGG by 70 but actuallyGTGAGG was more expressed than GTAAGG by about 27(5644 Table 2)

37 Conservation of the Individual Nucleotides in the TerminalHexanucleotides of the Introns Conservationmutation inthe couples of MouseHuman orthologous terminal hexanu-cleotides was also studied In 282 only of the couples wasthere a complete identity between the Mouse and Humanorthologous hexanucleotides In a higher percentage of cases(379) one nucleotide (regardless of position) was changedA decreasing percentage of couples exhibited two or threenucleotide differences and in only 09 of cases were allfour variable nucleotides changed (Figure 4) The number ofchanges per hexanucleotide averaged 117

The probability of random occurrence of the samenucleotide at a given position in orthologous MouseHumanhexanucleotides was calculated as previously reported(Table 5 column 5) The actual occurrence () of nucleotideidentities at each position is reported in column 6 of Table 5The percentages of actual identities were always significantly(119875 lt 001) higher than the percentages of random identitiesIn the nucleotides of the terminal hexanucleotides the actualtotal conservation was about 80 higher than the calculatedpurely random conservation

38 Conservation of Groups of Nucleotides in the TerminalHexanucleotides of the Introns The contribution of groupsof bases in the terminal hexanucleotides in favoring theMouseHuman conservation was studied by comparing theirexpected random probabilities with the observed actualfrequencies (see previous paragraph for the computing ofrandom probabilities and Table 5 column 6 for the actualconservation of the individual nucleotides at the differentpositions)

International Journal of Evolutionary Biology 7

Table 7 Conservation of groups of nucleotides (underlined) in the terminal hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

TTTxAG 345 plusmn 268 880 535 lt001TTxTAG 431 plusmn 298 926 495 lt001xTTTAG 169 plusmn 189 463 294 lt001TxTTAG 148 plusmn 177 417 269 lt001xTGCAG 227 plusmn 219 509 282 001

TTxxAG 2070 plusmn 595 2963 893 lt001xxTTAG 347 plusmn 269 648 301 002xTGxAG 450 plusmn 305 833 383 lt001

0

10

20

30

40

50

60

0 1 2 3 4

Figure 4 Abscissa number of nucleotide differences (irrespectiveof the position) in orthologous MouseHuman terminal hexanu-cleotides Ordinate percentage

Analysis of the terminal hexanucleotides showed thatthe nucleotide sequences TTTxAG TTxTAG xTTTAGTxTTAG and xTGCAG are significantly more expressedthan could be expected (Table 7) Furthermore some dinu-cleotides derived from the above sequences are themselvesoverexpressed TTxxAG (derived from TTTxAG and TTx-TAG) xxTTAG (derived from xTTTAG and TxTTAG) andxTGxAG (derived from xTGCAG) (Table 7)

Three hexanucleotides were found to be significantlyoverexpressed as compared to the random probabilityTTTCAG (expected 174 actual 463 119875 lt 001)TTTTAG (expected 072 actual 370 119875 lt 001) andCTGCAG (expected 054 actual 185 119875 = 003) Allthese hexanucleotides contained at least one of the conservedmotifs Other hexanucleotides although containing some ofthe overexpressed trinucleotides did not exhibit significantlydifferent conservation levels from those of the randomprobability and in no case did the occurrence of one of theconserved dinucleotides determine a higher conservation ofthe corresponding hexanucleotides

No motif associated with a significant underexpressionwas found

39 Splicing Variants Splicing variants of the individualgenes are usually of a different type in Mouse and Humanso the positions of the splicing sites no longer correspondin the two species (ie the orthology of the splicing sites is

lost) even though the main traits of the protein products arepreserved In only one instance in receptor KIT Mouse andHuman exhibited the same type of variant that is the use ofanother 51015840ss 12 nucleotides upstream of the canonical site

In several cases one entire exon present in the canonicalform is lacking in a variant although the other downstreamexons are regularly transcribed indicating that the constitu-tive 31015840ss at the end of the preceding intron did not operate Asan example in the transcript variant X1 (XM 005252446) ofthe Human IL2RA the canonical 31015840ss TTCCAG immediatelyupstream of exon 4 is silenced and the fourth exon (216 nt72 aa) of the canonical sequence (NM 000417) is lackingin this variant In other cases up to three consecutive 31015840sssare silenced in the Human IL2RG transcript variant X2(XM 005262262) 31015840sss ATCTAG CTCTAG and CTCCAGare all silenced with loss of exons 2 3 and 4 while exons5 6 7 and 8 are transcribed as in the canonical form(NM 000206) without alteration of the reading frame

In other cases the active alternative ss may be locatedupstream or downstream of the canonical ss resulting in analteration of the length of individual exons usually withoutframeshifts For example in theHuman variant X1 of CSF2RB(XM 005261340) the intron upstream of exon 7 ends byan ldquoearlyrdquo CCTCAG while the corresponding intron in thecanonical sequence (NM 007780) ends at another CCTCAG18 nt downstream resulting in an 18-nt longer exon in thevariant Although variations of the 31015840sss are more frequentthe 51015840sss may also be variable As an example in Humantranscript variant X1 of CD40 (XM 005260617) the 51015840ss afterexon 7 is GTGGGG replacing the GTGAGT of the canonicalvariant (NM 001250) which occurs 12 nt upstream

Detailed results are shown in the SupplementaryMaterialavailable at httpdxdoiorg1011552013818954

310 The Binding of the U1 snRNA We studied the ungappedbase-pairing between the U1 snRNA sequence ACTTAC(NCBI GenBank accession code NR 004430) conserved inMouse and Human and the last three nucleotides of theexons plus the corresponding 51015840ss hexanucleotide The 51015840ssGTAAGT which exactly base-pairs with the U1 is expressedin 190 of cases in Mouse and 171 of cases in Human(Table 2) In all other cases the match is not perfect Onaverage a complementary match between the U1 nucleotidesand the four variable nucleotides of the 51015840ss is achieved in

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

4 International Journal of Evolutionary Biology

Table 1 The cytokine receptors analyzed in this study (1) is theimmunoglobulin superfamily receptors (2) is the class I cytokinereceptor family (3) is the class II cytokine receptor family (inter-feron receptor family) (4) is the tumor necrosis factor receptorsuperfamily (5) is the chemokine receptor family (6) is thetransforming growth factor beta receptor family

RECEPTOR (Receptor family) (Number of exons)Interleukin 1 receptor type I (IL1R1) (1) (10 exons)C-Kit Hardy-Zuckerman 4 Feline Sarcoma Viral OncogeneHomolog (KIT) (1) (21 exons)Interleukin 2 receptor alpha (IL2RA) (2) (8 exons)Interleukin 2 receptor beta (IL2RB) (2) (9 exons)Interleukin 2 receptor gamma (IL2RG) (2) (8 exons)Interleukin 4 receptor (IL4R) (2) (9 exons)Interleukin 13 receptor alpha 1 (IL13RA1) (2) (11 exons)Interleukin 13 receptor alpha 2 (IL13RA2) (2) (9 exons)Erythropoietin receptor (EPOR) (2) (8 exons)Colony stimulating factor 2 receptor beta low-affinity(granulocyte-macrophage) (CSF2RB) (2) (13 exons)Colony stimulating factor 3 receptor (granulocyte) (CSF3R) (2)(15 exons)Leukemia inhibitory factor receptor alpha (LIFR) (2) (19 exons)Prolactin receptor (PRLR) (2) (8 exons)Oncostatin M receptor (OSMR) (2) (17 exons)Interleukin 10 receptor alpha (IL10RA) (3) (7 exons)Interleukin 10 receptor beta (IL10RB) (3) (7 exons)Interferon (alpha beta and omega) receptor 1 (IFNAR1) (3) (11exons)CD27 molecule (CD27) (4) (6 exons)CD40 molecule TNF receptor superfamily member 5 (CD40) (4)(9 exons)Lymphotoxin beta receptor (TNFR superfamily member 3)(LTBR) (4) (10 exons)Chemokine (C-C motif) receptor 7 (CCR7) (5) (3 exons)Chemokine (C-C motif) receptor 9 (CCR9) (5) (2 exons)Chemokine (C-C motif) receptor 10 (CCR10) (5) (2 exons)Chemokine (C-X-C motif) receptor 4 (CXCR4) (5) (2 exons)Chemokine (C-X-C motif) receptor 5 (CXCR5) (5) (2 exons)Transforming growth factor beta receptor III (TGFBR3) (6) (16exons)

and 31015840ss (ldquobifunctionalrdquo hexanucleotides) Of these 8 neverappeared as initial or terminal hexanucleotides in our sampleand 3 (GTAAAG GTACAG and GTTAAG) appeared at thebeginning (but never at the end) of some introns while 5(GTCCAG GTGCAG GTGTAG GTTCAG and GTTTAG)appeared at the end (but never at the beginning) of otherintrons in both Mouse and Human

35 Conservation of the Individual Nucleotides in the Ini-tial Hexanucleotides of the Introns The structural conserva-tionmutation in the couples of MouseHuman orthologousinitial hexanucleotides was studied In more than half of

Table 2 The initial more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence inMouse

inHuman

Average()

GTGAGT 190 181 185GTAAGT 190 171 181GTAAGA 88 88 88GTGAGA 60 51 56GTGAGG 42 69 56GTAAGG 37 51 44GTAAGC 42 28 35Total 649 639 645

0

10

20

30

40

50

60

0 1 2 3 4

Figure 3Abscissa number of nucleotide differences (irrespective ofthe position) in orthologousMouseHuman initial hexanucleotidesOrdinate percentage

the couples (574) there was a complete identity betweenthe Mouse and Human orthologous hexanucleotides Adecreasing percentage of couples exhibited one two or threenucleotide differences (regardless of position) in no instanceall four variable nucleotides changed (Figure 3) The numberof changes per hexanucleotide averaged 058

We calculated the probability of random occurrenceof the same nucleotide at a given position in orthologousMouseHuman hexanucleotides (Table 3 column 5) Forinstance A in the third position is present with probability0597 in the Mouse and 0570 in the Human (Table 3GTAxxx columns 2 and 3) then the probability of randomoccurrence of A at that position in both Mouse and Humanis 0597 times 0570 = 03403 (or 3403) (binomial distribution119899 = 216) The actual occurrence () of nucleotide identitiesat each position is reported in column 6 of Table 3 Thepercentages of actual identities were always significantly (119875 lt001) higher than the percentages of random identities andthe difference expresses the level of nucleotide biologicalconservation In the nucleotides of the initial hexanucleotidesthe actual total conservation was about 53 higher than thecalculated purely random conservation

36 Conservation of Groups of Nucleotides in the InitialHexanucleotides of the Introns The contribution of groupsof bases in the initial hexanucleotides to enhancing or

International Journal of Evolutionary Biology 5

Table 3 Columns 1ndash4 the nucleotide composition according to the position in the initial hexanucleotides in Mouse and Human Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the initial hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()GTAxxx 597 575 3403 5093 119875 lt 001

GTCxxx 19 23 004 185 119875 lt 001

GTGxxx 370 384 1421 3009 119875 lt 001

GTTxxx 14 23 003 139 119875 lt 001

Total 4831 8426 119875 lt 001

GTxAxx 824 810 6674 7778 119875 lt 001

GTxCxx 28 28 008 231 119875 lt 001

GTxGxx 69 83 057 417 119875 lt 001

GTxTxx 79 79 062 602 119875 lt 001

Total 6801 9028 119875 lt 001

GTxxAx 79 74 058 648 119875 lt 001

GTxxCx 14 18 003 093 119875 lt 001

GTxxGx 852 843 7182 8241 119875 lt 001

GTxxTx 55 65 036 370 119875 lt 001

Total 7279 9352 119875 lt 001

GTxxxA 218 241 525 1620 119875 lt 001

GTxxxC 116 74 119875 = 002 086 370 119875 lt 001

GTxxxG 166 199 330 833 119875 lt 001

GTxxxT 500 486 2430 4537 119875 lt 001

Total 3371 736 119875 lt 001

Table 4 The terminal more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence in Mouse in Human Average ()TTTCAG 74 65 69CTGCAG 79 46 63TTTTAG 51 60 56TTGCAG 37 69 53TTCTAG 32 65 49CCACAG 46 32 39TTCCAG 51 28 39Total 370 365 368

reducing the level of MouseHuman conservation was thenstudied The random probability of occurrence of completehexanucleotides or groups of three or two nucleotides atgiven positions was calculated on the basis of the percentagesof the actual conservation of the individual componentnucleotides at the corresponding positions as reported inTable 3 (column 6) For example the random probability ofthe hexanucleotide GTAAGT is

10000 times 10000 times 05093 times 07778

times 08241 times 04537 = 01481 (or 1481)(1)

(binomial distribution 119899 = 216) and the random probabilityof the sequence GTAAGx is

10000 times 10000 times 05093 times 07778 times 08241

= 03265 (or 3265) (2)

The expected random probabilities were compared withthe actual frequencies Analysis of the initial hexanu-cleotides showed that the sequences GTGAGx GTGAxTand GTGxGT are significantly more expressed than could beexpected (Table 6) Analysis of all the hexanucleotides whichincluded these sequences demonstrated that in GTGAGT(which included all three overexpressed sequences) the over-expression was highly significant (expected 875 actual1389 119875 lt 001) overexpression of GTGAGG (whichincluded the GAG motif only) was less significant possiblydue to the lower frequency (expected 161 actual 324119875 = 006) None of the other hexanucleotides whichincluded some of the overexpressed sequences exhibitedsignificantly different expression levels from those expecteddue to the fact that they contained other not overexpressed orunderexpressed motifs

Sequences GTAAGx and GTAxGG were found to besignificantly less expressed than could be expected (Table 6)The hexanucleotide GTAAGG which included both under-expressed sequences was clearly underexpressed although ata low significance level due to the low frequency (expected272 actual 093 119875 = 007) All other hexanucleotideswhich included some of the underexpressed sequences didnot exhibit significantly different expression levels from thoseexpected

Another observation supports the role of the trinu-cleotides GAG and AAG in the association to overexpres-sion and underexpression respectively The hexanucleotidesGTGAGT and GTAAGT differ at the third nucleotide onlyAs at the third nucleotide the actual conservation of Ais 169 times the actual conservation of G (50933009

6 International Journal of Evolutionary Biology

Table 5 Columns 1ndash4 the nucleotide composition according to the position in the terminal hexanucleotides inMouse andHuman Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the terminal hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()AxxxAG 51 56 029 231 119875 lt 001

CxxxAG 352 338 119 2361 119875 lt 001

GxxxAG 83 46 119875 = 001 038 370 119875 lt 001

TxxxAG 514 560 2878 4259 119875 lt 001

Total 4135 7221 119875 lt 001

xAxxAG 79 93 073 509 119875 lt 001

xCxxAG 259 222 575 1389 119875 lt 001

xGxxAG 83 83 069 509 119875 lt 001

xTxxAG 579 602 3486 4861 119875 lt 001

Total 4203 7268 119875 lt 001

xxAxAG 273 269 734 1713 119875 lt 001

xxCxAG 273 291 794 1667 119875 lt 001

xxGxAG 199 213 424 926 119875 lt 001

xxTxAG 255 227 579 1667 119875 lt 001

Total 2531 5973 119875 lt 001

xxxAAG 74 70 052 602 119875 lt 001

xxxCAG 630 597 376 5046 119875 lt 001

xxxGAG 00 00 000 000xxxTAG 296 333 986 2083 119875 lt 001

Total 4798 7731 119875 lt 001

Table 6 Conservation of groups of nucleotides (underlined) in the initial hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

GTGAGx 1929 2546 617 002GTGAxT 1062 1435 373 005GTGxGT 1125 1528 403 004

GTAAGx 3265 2639 minus626 003GTAxGG 350 093 minus257 002

Table 3) GTAAGT could be expected to be more expressedthan GTGAGT by about 70 On the contrary GTGAGTwas actually more expressed than GTAAGT by about 2(185181 Table 2) and indeed it was the most frequentlyexpressed initial hexanucleotide Similarly GTAAGG shouldbe more expressed than GTGAGG by 70 but actuallyGTGAGG was more expressed than GTAAGG by about 27(5644 Table 2)

37 Conservation of the Individual Nucleotides in the TerminalHexanucleotides of the Introns Conservationmutation inthe couples of MouseHuman orthologous terminal hexanu-cleotides was also studied In 282 only of the couples wasthere a complete identity between the Mouse and Humanorthologous hexanucleotides In a higher percentage of cases(379) one nucleotide (regardless of position) was changedA decreasing percentage of couples exhibited two or threenucleotide differences and in only 09 of cases were allfour variable nucleotides changed (Figure 4) The number ofchanges per hexanucleotide averaged 117

The probability of random occurrence of the samenucleotide at a given position in orthologous MouseHumanhexanucleotides was calculated as previously reported(Table 5 column 5) The actual occurrence () of nucleotideidentities at each position is reported in column 6 of Table 5The percentages of actual identities were always significantly(119875 lt 001) higher than the percentages of random identitiesIn the nucleotides of the terminal hexanucleotides the actualtotal conservation was about 80 higher than the calculatedpurely random conservation

38 Conservation of Groups of Nucleotides in the TerminalHexanucleotides of the Introns The contribution of groupsof bases in the terminal hexanucleotides in favoring theMouseHuman conservation was studied by comparing theirexpected random probabilities with the observed actualfrequencies (see previous paragraph for the computing ofrandom probabilities and Table 5 column 6 for the actualconservation of the individual nucleotides at the differentpositions)

International Journal of Evolutionary Biology 7

Table 7 Conservation of groups of nucleotides (underlined) in the terminal hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

TTTxAG 345 plusmn 268 880 535 lt001TTxTAG 431 plusmn 298 926 495 lt001xTTTAG 169 plusmn 189 463 294 lt001TxTTAG 148 plusmn 177 417 269 lt001xTGCAG 227 plusmn 219 509 282 001

TTxxAG 2070 plusmn 595 2963 893 lt001xxTTAG 347 plusmn 269 648 301 002xTGxAG 450 plusmn 305 833 383 lt001

0

10

20

30

40

50

60

0 1 2 3 4

Figure 4 Abscissa number of nucleotide differences (irrespectiveof the position) in orthologous MouseHuman terminal hexanu-cleotides Ordinate percentage

Analysis of the terminal hexanucleotides showed thatthe nucleotide sequences TTTxAG TTxTAG xTTTAGTxTTAG and xTGCAG are significantly more expressedthan could be expected (Table 7) Furthermore some dinu-cleotides derived from the above sequences are themselvesoverexpressed TTxxAG (derived from TTTxAG and TTx-TAG) xxTTAG (derived from xTTTAG and TxTTAG) andxTGxAG (derived from xTGCAG) (Table 7)

Three hexanucleotides were found to be significantlyoverexpressed as compared to the random probabilityTTTCAG (expected 174 actual 463 119875 lt 001)TTTTAG (expected 072 actual 370 119875 lt 001) andCTGCAG (expected 054 actual 185 119875 = 003) Allthese hexanucleotides contained at least one of the conservedmotifs Other hexanucleotides although containing some ofthe overexpressed trinucleotides did not exhibit significantlydifferent conservation levels from those of the randomprobability and in no case did the occurrence of one of theconserved dinucleotides determine a higher conservation ofthe corresponding hexanucleotides

No motif associated with a significant underexpressionwas found

39 Splicing Variants Splicing variants of the individualgenes are usually of a different type in Mouse and Humanso the positions of the splicing sites no longer correspondin the two species (ie the orthology of the splicing sites is

lost) even though the main traits of the protein products arepreserved In only one instance in receptor KIT Mouse andHuman exhibited the same type of variant that is the use ofanother 51015840ss 12 nucleotides upstream of the canonical site

In several cases one entire exon present in the canonicalform is lacking in a variant although the other downstreamexons are regularly transcribed indicating that the constitu-tive 31015840ss at the end of the preceding intron did not operate Asan example in the transcript variant X1 (XM 005252446) ofthe Human IL2RA the canonical 31015840ss TTCCAG immediatelyupstream of exon 4 is silenced and the fourth exon (216 nt72 aa) of the canonical sequence (NM 000417) is lackingin this variant In other cases up to three consecutive 31015840sssare silenced in the Human IL2RG transcript variant X2(XM 005262262) 31015840sss ATCTAG CTCTAG and CTCCAGare all silenced with loss of exons 2 3 and 4 while exons5 6 7 and 8 are transcribed as in the canonical form(NM 000206) without alteration of the reading frame

In other cases the active alternative ss may be locatedupstream or downstream of the canonical ss resulting in analteration of the length of individual exons usually withoutframeshifts For example in theHuman variant X1 of CSF2RB(XM 005261340) the intron upstream of exon 7 ends byan ldquoearlyrdquo CCTCAG while the corresponding intron in thecanonical sequence (NM 007780) ends at another CCTCAG18 nt downstream resulting in an 18-nt longer exon in thevariant Although variations of the 31015840sss are more frequentthe 51015840sss may also be variable As an example in Humantranscript variant X1 of CD40 (XM 005260617) the 51015840ss afterexon 7 is GTGGGG replacing the GTGAGT of the canonicalvariant (NM 001250) which occurs 12 nt upstream

Detailed results are shown in the SupplementaryMaterialavailable at httpdxdoiorg1011552013818954

310 The Binding of the U1 snRNA We studied the ungappedbase-pairing between the U1 snRNA sequence ACTTAC(NCBI GenBank accession code NR 004430) conserved inMouse and Human and the last three nucleotides of theexons plus the corresponding 51015840ss hexanucleotide The 51015840ssGTAAGT which exactly base-pairs with the U1 is expressedin 190 of cases in Mouse and 171 of cases in Human(Table 2) In all other cases the match is not perfect Onaverage a complementary match between the U1 nucleotidesand the four variable nucleotides of the 51015840ss is achieved in

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

International Journal of Evolutionary Biology 5

Table 3 Columns 1ndash4 the nucleotide composition according to the position in the initial hexanucleotides in Mouse and Human Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the initial hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()GTAxxx 597 575 3403 5093 119875 lt 001

GTCxxx 19 23 004 185 119875 lt 001

GTGxxx 370 384 1421 3009 119875 lt 001

GTTxxx 14 23 003 139 119875 lt 001

Total 4831 8426 119875 lt 001

GTxAxx 824 810 6674 7778 119875 lt 001

GTxCxx 28 28 008 231 119875 lt 001

GTxGxx 69 83 057 417 119875 lt 001

GTxTxx 79 79 062 602 119875 lt 001

Total 6801 9028 119875 lt 001

GTxxAx 79 74 058 648 119875 lt 001

GTxxCx 14 18 003 093 119875 lt 001

GTxxGx 852 843 7182 8241 119875 lt 001

GTxxTx 55 65 036 370 119875 lt 001

Total 7279 9352 119875 lt 001

GTxxxA 218 241 525 1620 119875 lt 001

GTxxxC 116 74 119875 = 002 086 370 119875 lt 001

GTxxxG 166 199 330 833 119875 lt 001

GTxxxT 500 486 2430 4537 119875 lt 001

Total 3371 736 119875 lt 001

Table 4 The terminal more frequently expressed hexanucleotidesPercentages are over the total of 216 couples of hexanucleotides

Sequence in Mouse in Human Average ()TTTCAG 74 65 69CTGCAG 79 46 63TTTTAG 51 60 56TTGCAG 37 69 53TTCTAG 32 65 49CCACAG 46 32 39TTCCAG 51 28 39Total 370 365 368

reducing the level of MouseHuman conservation was thenstudied The random probability of occurrence of completehexanucleotides or groups of three or two nucleotides atgiven positions was calculated on the basis of the percentagesof the actual conservation of the individual componentnucleotides at the corresponding positions as reported inTable 3 (column 6) For example the random probability ofthe hexanucleotide GTAAGT is

10000 times 10000 times 05093 times 07778

times 08241 times 04537 = 01481 (or 1481)(1)

(binomial distribution 119899 = 216) and the random probabilityof the sequence GTAAGx is

10000 times 10000 times 05093 times 07778 times 08241

= 03265 (or 3265) (2)

The expected random probabilities were compared withthe actual frequencies Analysis of the initial hexanu-cleotides showed that the sequences GTGAGx GTGAxTand GTGxGT are significantly more expressed than could beexpected (Table 6) Analysis of all the hexanucleotides whichincluded these sequences demonstrated that in GTGAGT(which included all three overexpressed sequences) the over-expression was highly significant (expected 875 actual1389 119875 lt 001) overexpression of GTGAGG (whichincluded the GAG motif only) was less significant possiblydue to the lower frequency (expected 161 actual 324119875 = 006) None of the other hexanucleotides whichincluded some of the overexpressed sequences exhibitedsignificantly different expression levels from those expecteddue to the fact that they contained other not overexpressed orunderexpressed motifs

Sequences GTAAGx and GTAxGG were found to besignificantly less expressed than could be expected (Table 6)The hexanucleotide GTAAGG which included both under-expressed sequences was clearly underexpressed although ata low significance level due to the low frequency (expected272 actual 093 119875 = 007) All other hexanucleotideswhich included some of the underexpressed sequences didnot exhibit significantly different expression levels from thoseexpected

Another observation supports the role of the trinu-cleotides GAG and AAG in the association to overexpres-sion and underexpression respectively The hexanucleotidesGTGAGT and GTAAGT differ at the third nucleotide onlyAs at the third nucleotide the actual conservation of Ais 169 times the actual conservation of G (50933009

6 International Journal of Evolutionary Biology

Table 5 Columns 1ndash4 the nucleotide composition according to the position in the terminal hexanucleotides inMouse andHuman Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the terminal hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()AxxxAG 51 56 029 231 119875 lt 001

CxxxAG 352 338 119 2361 119875 lt 001

GxxxAG 83 46 119875 = 001 038 370 119875 lt 001

TxxxAG 514 560 2878 4259 119875 lt 001

Total 4135 7221 119875 lt 001

xAxxAG 79 93 073 509 119875 lt 001

xCxxAG 259 222 575 1389 119875 lt 001

xGxxAG 83 83 069 509 119875 lt 001

xTxxAG 579 602 3486 4861 119875 lt 001

Total 4203 7268 119875 lt 001

xxAxAG 273 269 734 1713 119875 lt 001

xxCxAG 273 291 794 1667 119875 lt 001

xxGxAG 199 213 424 926 119875 lt 001

xxTxAG 255 227 579 1667 119875 lt 001

Total 2531 5973 119875 lt 001

xxxAAG 74 70 052 602 119875 lt 001

xxxCAG 630 597 376 5046 119875 lt 001

xxxGAG 00 00 000 000xxxTAG 296 333 986 2083 119875 lt 001

Total 4798 7731 119875 lt 001

Table 6 Conservation of groups of nucleotides (underlined) in the initial hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

GTGAGx 1929 2546 617 002GTGAxT 1062 1435 373 005GTGxGT 1125 1528 403 004

GTAAGx 3265 2639 minus626 003GTAxGG 350 093 minus257 002

Table 3) GTAAGT could be expected to be more expressedthan GTGAGT by about 70 On the contrary GTGAGTwas actually more expressed than GTAAGT by about 2(185181 Table 2) and indeed it was the most frequentlyexpressed initial hexanucleotide Similarly GTAAGG shouldbe more expressed than GTGAGG by 70 but actuallyGTGAGG was more expressed than GTAAGG by about 27(5644 Table 2)

37 Conservation of the Individual Nucleotides in the TerminalHexanucleotides of the Introns Conservationmutation inthe couples of MouseHuman orthologous terminal hexanu-cleotides was also studied In 282 only of the couples wasthere a complete identity between the Mouse and Humanorthologous hexanucleotides In a higher percentage of cases(379) one nucleotide (regardless of position) was changedA decreasing percentage of couples exhibited two or threenucleotide differences and in only 09 of cases were allfour variable nucleotides changed (Figure 4) The number ofchanges per hexanucleotide averaged 117

The probability of random occurrence of the samenucleotide at a given position in orthologous MouseHumanhexanucleotides was calculated as previously reported(Table 5 column 5) The actual occurrence () of nucleotideidentities at each position is reported in column 6 of Table 5The percentages of actual identities were always significantly(119875 lt 001) higher than the percentages of random identitiesIn the nucleotides of the terminal hexanucleotides the actualtotal conservation was about 80 higher than the calculatedpurely random conservation

38 Conservation of Groups of Nucleotides in the TerminalHexanucleotides of the Introns The contribution of groupsof bases in the terminal hexanucleotides in favoring theMouseHuman conservation was studied by comparing theirexpected random probabilities with the observed actualfrequencies (see previous paragraph for the computing ofrandom probabilities and Table 5 column 6 for the actualconservation of the individual nucleotides at the differentpositions)

International Journal of Evolutionary Biology 7

Table 7 Conservation of groups of nucleotides (underlined) in the terminal hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

TTTxAG 345 plusmn 268 880 535 lt001TTxTAG 431 plusmn 298 926 495 lt001xTTTAG 169 plusmn 189 463 294 lt001TxTTAG 148 plusmn 177 417 269 lt001xTGCAG 227 plusmn 219 509 282 001

TTxxAG 2070 plusmn 595 2963 893 lt001xxTTAG 347 plusmn 269 648 301 002xTGxAG 450 plusmn 305 833 383 lt001

0

10

20

30

40

50

60

0 1 2 3 4

Figure 4 Abscissa number of nucleotide differences (irrespectiveof the position) in orthologous MouseHuman terminal hexanu-cleotides Ordinate percentage

Analysis of the terminal hexanucleotides showed thatthe nucleotide sequences TTTxAG TTxTAG xTTTAGTxTTAG and xTGCAG are significantly more expressedthan could be expected (Table 7) Furthermore some dinu-cleotides derived from the above sequences are themselvesoverexpressed TTxxAG (derived from TTTxAG and TTx-TAG) xxTTAG (derived from xTTTAG and TxTTAG) andxTGxAG (derived from xTGCAG) (Table 7)

Three hexanucleotides were found to be significantlyoverexpressed as compared to the random probabilityTTTCAG (expected 174 actual 463 119875 lt 001)TTTTAG (expected 072 actual 370 119875 lt 001) andCTGCAG (expected 054 actual 185 119875 = 003) Allthese hexanucleotides contained at least one of the conservedmotifs Other hexanucleotides although containing some ofthe overexpressed trinucleotides did not exhibit significantlydifferent conservation levels from those of the randomprobability and in no case did the occurrence of one of theconserved dinucleotides determine a higher conservation ofthe corresponding hexanucleotides

No motif associated with a significant underexpressionwas found

39 Splicing Variants Splicing variants of the individualgenes are usually of a different type in Mouse and Humanso the positions of the splicing sites no longer correspondin the two species (ie the orthology of the splicing sites is

lost) even though the main traits of the protein products arepreserved In only one instance in receptor KIT Mouse andHuman exhibited the same type of variant that is the use ofanother 51015840ss 12 nucleotides upstream of the canonical site

In several cases one entire exon present in the canonicalform is lacking in a variant although the other downstreamexons are regularly transcribed indicating that the constitu-tive 31015840ss at the end of the preceding intron did not operate Asan example in the transcript variant X1 (XM 005252446) ofthe Human IL2RA the canonical 31015840ss TTCCAG immediatelyupstream of exon 4 is silenced and the fourth exon (216 nt72 aa) of the canonical sequence (NM 000417) is lackingin this variant In other cases up to three consecutive 31015840sssare silenced in the Human IL2RG transcript variant X2(XM 005262262) 31015840sss ATCTAG CTCTAG and CTCCAGare all silenced with loss of exons 2 3 and 4 while exons5 6 7 and 8 are transcribed as in the canonical form(NM 000206) without alteration of the reading frame

In other cases the active alternative ss may be locatedupstream or downstream of the canonical ss resulting in analteration of the length of individual exons usually withoutframeshifts For example in theHuman variant X1 of CSF2RB(XM 005261340) the intron upstream of exon 7 ends byan ldquoearlyrdquo CCTCAG while the corresponding intron in thecanonical sequence (NM 007780) ends at another CCTCAG18 nt downstream resulting in an 18-nt longer exon in thevariant Although variations of the 31015840sss are more frequentthe 51015840sss may also be variable As an example in Humantranscript variant X1 of CD40 (XM 005260617) the 51015840ss afterexon 7 is GTGGGG replacing the GTGAGT of the canonicalvariant (NM 001250) which occurs 12 nt upstream

Detailed results are shown in the SupplementaryMaterialavailable at httpdxdoiorg1011552013818954

310 The Binding of the U1 snRNA We studied the ungappedbase-pairing between the U1 snRNA sequence ACTTAC(NCBI GenBank accession code NR 004430) conserved inMouse and Human and the last three nucleotides of theexons plus the corresponding 51015840ss hexanucleotide The 51015840ssGTAAGT which exactly base-pairs with the U1 is expressedin 190 of cases in Mouse and 171 of cases in Human(Table 2) In all other cases the match is not perfect Onaverage a complementary match between the U1 nucleotidesand the four variable nucleotides of the 51015840ss is achieved in

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

6 International Journal of Evolutionary Biology

Table 5 Columns 1ndash4 the nucleotide composition according to the position in the terminal hexanucleotides inMouse andHuman Columns1 and 5ndash7 analysis of the conservation of individual nucleotides in the terminal hexanucleotides of the introns Expected random conservationin column 5 and actual conservation in column 6 Percentages are over the total of 216 couples of hexanucleotides

1 2 3 4 5 6 7Sequence in Mouse in Human Random conservation () Actual conservation ()AxxxAG 51 56 029 231 119875 lt 001

CxxxAG 352 338 119 2361 119875 lt 001

GxxxAG 83 46 119875 = 001 038 370 119875 lt 001

TxxxAG 514 560 2878 4259 119875 lt 001

Total 4135 7221 119875 lt 001

xAxxAG 79 93 073 509 119875 lt 001

xCxxAG 259 222 575 1389 119875 lt 001

xGxxAG 83 83 069 509 119875 lt 001

xTxxAG 579 602 3486 4861 119875 lt 001

Total 4203 7268 119875 lt 001

xxAxAG 273 269 734 1713 119875 lt 001

xxCxAG 273 291 794 1667 119875 lt 001

xxGxAG 199 213 424 926 119875 lt 001

xxTxAG 255 227 579 1667 119875 lt 001

Total 2531 5973 119875 lt 001

xxxAAG 74 70 052 602 119875 lt 001

xxxCAG 630 597 376 5046 119875 lt 001

xxxGAG 00 00 000 000xxxTAG 296 333 986 2083 119875 lt 001

Total 4798 7731 119875 lt 001

Table 6 Conservation of groups of nucleotides (underlined) in the initial hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

GTGAGx 1929 2546 617 002GTGAxT 1062 1435 373 005GTGxGT 1125 1528 403 004

GTAAGx 3265 2639 minus626 003GTAxGG 350 093 minus257 002

Table 3) GTAAGT could be expected to be more expressedthan GTGAGT by about 70 On the contrary GTGAGTwas actually more expressed than GTAAGT by about 2(185181 Table 2) and indeed it was the most frequentlyexpressed initial hexanucleotide Similarly GTAAGG shouldbe more expressed than GTGAGG by 70 but actuallyGTGAGG was more expressed than GTAAGG by about 27(5644 Table 2)

37 Conservation of the Individual Nucleotides in the TerminalHexanucleotides of the Introns Conservationmutation inthe couples of MouseHuman orthologous terminal hexanu-cleotides was also studied In 282 only of the couples wasthere a complete identity between the Mouse and Humanorthologous hexanucleotides In a higher percentage of cases(379) one nucleotide (regardless of position) was changedA decreasing percentage of couples exhibited two or threenucleotide differences and in only 09 of cases were allfour variable nucleotides changed (Figure 4) The number ofchanges per hexanucleotide averaged 117

The probability of random occurrence of the samenucleotide at a given position in orthologous MouseHumanhexanucleotides was calculated as previously reported(Table 5 column 5) The actual occurrence () of nucleotideidentities at each position is reported in column 6 of Table 5The percentages of actual identities were always significantly(119875 lt 001) higher than the percentages of random identitiesIn the nucleotides of the terminal hexanucleotides the actualtotal conservation was about 80 higher than the calculatedpurely random conservation

38 Conservation of Groups of Nucleotides in the TerminalHexanucleotides of the Introns The contribution of groupsof bases in the terminal hexanucleotides in favoring theMouseHuman conservation was studied by comparing theirexpected random probabilities with the observed actualfrequencies (see previous paragraph for the computing ofrandom probabilities and Table 5 column 6 for the actualconservation of the individual nucleotides at the differentpositions)

International Journal of Evolutionary Biology 7

Table 7 Conservation of groups of nucleotides (underlined) in the terminal hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

TTTxAG 345 plusmn 268 880 535 lt001TTxTAG 431 plusmn 298 926 495 lt001xTTTAG 169 plusmn 189 463 294 lt001TxTTAG 148 plusmn 177 417 269 lt001xTGCAG 227 plusmn 219 509 282 001

TTxxAG 2070 plusmn 595 2963 893 lt001xxTTAG 347 plusmn 269 648 301 002xTGxAG 450 plusmn 305 833 383 lt001

0

10

20

30

40

50

60

0 1 2 3 4

Figure 4 Abscissa number of nucleotide differences (irrespectiveof the position) in orthologous MouseHuman terminal hexanu-cleotides Ordinate percentage

Analysis of the terminal hexanucleotides showed thatthe nucleotide sequences TTTxAG TTxTAG xTTTAGTxTTAG and xTGCAG are significantly more expressedthan could be expected (Table 7) Furthermore some dinu-cleotides derived from the above sequences are themselvesoverexpressed TTxxAG (derived from TTTxAG and TTx-TAG) xxTTAG (derived from xTTTAG and TxTTAG) andxTGxAG (derived from xTGCAG) (Table 7)

Three hexanucleotides were found to be significantlyoverexpressed as compared to the random probabilityTTTCAG (expected 174 actual 463 119875 lt 001)TTTTAG (expected 072 actual 370 119875 lt 001) andCTGCAG (expected 054 actual 185 119875 = 003) Allthese hexanucleotides contained at least one of the conservedmotifs Other hexanucleotides although containing some ofthe overexpressed trinucleotides did not exhibit significantlydifferent conservation levels from those of the randomprobability and in no case did the occurrence of one of theconserved dinucleotides determine a higher conservation ofthe corresponding hexanucleotides

No motif associated with a significant underexpressionwas found

39 Splicing Variants Splicing variants of the individualgenes are usually of a different type in Mouse and Humanso the positions of the splicing sites no longer correspondin the two species (ie the orthology of the splicing sites is

lost) even though the main traits of the protein products arepreserved In only one instance in receptor KIT Mouse andHuman exhibited the same type of variant that is the use ofanother 51015840ss 12 nucleotides upstream of the canonical site

In several cases one entire exon present in the canonicalform is lacking in a variant although the other downstreamexons are regularly transcribed indicating that the constitu-tive 31015840ss at the end of the preceding intron did not operate Asan example in the transcript variant X1 (XM 005252446) ofthe Human IL2RA the canonical 31015840ss TTCCAG immediatelyupstream of exon 4 is silenced and the fourth exon (216 nt72 aa) of the canonical sequence (NM 000417) is lackingin this variant In other cases up to three consecutive 31015840sssare silenced in the Human IL2RG transcript variant X2(XM 005262262) 31015840sss ATCTAG CTCTAG and CTCCAGare all silenced with loss of exons 2 3 and 4 while exons5 6 7 and 8 are transcribed as in the canonical form(NM 000206) without alteration of the reading frame

In other cases the active alternative ss may be locatedupstream or downstream of the canonical ss resulting in analteration of the length of individual exons usually withoutframeshifts For example in theHuman variant X1 of CSF2RB(XM 005261340) the intron upstream of exon 7 ends byan ldquoearlyrdquo CCTCAG while the corresponding intron in thecanonical sequence (NM 007780) ends at another CCTCAG18 nt downstream resulting in an 18-nt longer exon in thevariant Although variations of the 31015840sss are more frequentthe 51015840sss may also be variable As an example in Humantranscript variant X1 of CD40 (XM 005260617) the 51015840ss afterexon 7 is GTGGGG replacing the GTGAGT of the canonicalvariant (NM 001250) which occurs 12 nt upstream

Detailed results are shown in the SupplementaryMaterialavailable at httpdxdoiorg1011552013818954

310 The Binding of the U1 snRNA We studied the ungappedbase-pairing between the U1 snRNA sequence ACTTAC(NCBI GenBank accession code NR 004430) conserved inMouse and Human and the last three nucleotides of theexons plus the corresponding 51015840ss hexanucleotide The 51015840ssGTAAGT which exactly base-pairs with the U1 is expressedin 190 of cases in Mouse and 171 of cases in Human(Table 2) In all other cases the match is not perfect Onaverage a complementary match between the U1 nucleotidesand the four variable nucleotides of the 51015840ss is achieved in

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

International Journal of Evolutionary Biology 7

Table 7 Conservation of groups of nucleotides (underlined) in the terminal hexanucleotides of the introns Percentages are over the total of216 couples of hexanucleotides

Sequence Random probability () Actual () Actual-expected () 119875

TTTxAG 345 plusmn 268 880 535 lt001TTxTAG 431 plusmn 298 926 495 lt001xTTTAG 169 plusmn 189 463 294 lt001TxTTAG 148 plusmn 177 417 269 lt001xTGCAG 227 plusmn 219 509 282 001

TTxxAG 2070 plusmn 595 2963 893 lt001xxTTAG 347 plusmn 269 648 301 002xTGxAG 450 plusmn 305 833 383 lt001

0

10

20

30

40

50

60

0 1 2 3 4

Figure 4 Abscissa number of nucleotide differences (irrespectiveof the position) in orthologous MouseHuman terminal hexanu-cleotides Ordinate percentage

Analysis of the terminal hexanucleotides showed thatthe nucleotide sequences TTTxAG TTxTAG xTTTAGTxTTAG and xTGCAG are significantly more expressedthan could be expected (Table 7) Furthermore some dinu-cleotides derived from the above sequences are themselvesoverexpressed TTxxAG (derived from TTTxAG and TTx-TAG) xxTTAG (derived from xTTTAG and TxTTAG) andxTGxAG (derived from xTGCAG) (Table 7)

Three hexanucleotides were found to be significantlyoverexpressed as compared to the random probabilityTTTCAG (expected 174 actual 463 119875 lt 001)TTTTAG (expected 072 actual 370 119875 lt 001) andCTGCAG (expected 054 actual 185 119875 = 003) Allthese hexanucleotides contained at least one of the conservedmotifs Other hexanucleotides although containing some ofthe overexpressed trinucleotides did not exhibit significantlydifferent conservation levels from those of the randomprobability and in no case did the occurrence of one of theconserved dinucleotides determine a higher conservation ofthe corresponding hexanucleotides

No motif associated with a significant underexpressionwas found

39 Splicing Variants Splicing variants of the individualgenes are usually of a different type in Mouse and Humanso the positions of the splicing sites no longer correspondin the two species (ie the orthology of the splicing sites is

lost) even though the main traits of the protein products arepreserved In only one instance in receptor KIT Mouse andHuman exhibited the same type of variant that is the use ofanother 51015840ss 12 nucleotides upstream of the canonical site

In several cases one entire exon present in the canonicalform is lacking in a variant although the other downstreamexons are regularly transcribed indicating that the constitu-tive 31015840ss at the end of the preceding intron did not operate Asan example in the transcript variant X1 (XM 005252446) ofthe Human IL2RA the canonical 31015840ss TTCCAG immediatelyupstream of exon 4 is silenced and the fourth exon (216 nt72 aa) of the canonical sequence (NM 000417) is lackingin this variant In other cases up to three consecutive 31015840sssare silenced in the Human IL2RG transcript variant X2(XM 005262262) 31015840sss ATCTAG CTCTAG and CTCCAGare all silenced with loss of exons 2 3 and 4 while exons5 6 7 and 8 are transcribed as in the canonical form(NM 000206) without alteration of the reading frame

In other cases the active alternative ss may be locatedupstream or downstream of the canonical ss resulting in analteration of the length of individual exons usually withoutframeshifts For example in theHuman variant X1 of CSF2RB(XM 005261340) the intron upstream of exon 7 ends byan ldquoearlyrdquo CCTCAG while the corresponding intron in thecanonical sequence (NM 007780) ends at another CCTCAG18 nt downstream resulting in an 18-nt longer exon in thevariant Although variations of the 31015840sss are more frequentthe 51015840sss may also be variable As an example in Humantranscript variant X1 of CD40 (XM 005260617) the 51015840ss afterexon 7 is GTGGGG replacing the GTGAGT of the canonicalvariant (NM 001250) which occurs 12 nt upstream

Detailed results are shown in the SupplementaryMaterialavailable at httpdxdoiorg1011552013818954

310 The Binding of the U1 snRNA We studied the ungappedbase-pairing between the U1 snRNA sequence ACTTAC(NCBI GenBank accession code NR 004430) conserved inMouse and Human and the last three nucleotides of theexons plus the corresponding 51015840ss hexanucleotide The 51015840ssGTAAGT which exactly base-pairs with the U1 is expressedin 190 of cases in Mouse and 171 of cases in Human(Table 2) In all other cases the match is not perfect Onaverage a complementary match between the U1 nucleotidesand the four variable nucleotides of the 51015840ss is achieved in

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

8 International Journal of Evolutionary Biology

67 of nucleotides The highest match is with the secondvariable nucleotide (the fourth of the 51015840ss) with 78 ofoccurrences while the matching of all the other variablenucleotides averages 50 Indeed all seven more expressed51015840ss of Mouse and Human have an ldquoArdquo in the fourth position(Table 2) The lowest matching is with the sixth nucleotideHowever the number of matches in the individual 51015840sss isvery variable and in a few cases (eg GTTTCG) only theinitial GTmatcheswithU1 In very few caseswe observed thatthe optimal ungapped base-pairing of U1 could be achievedby partly binding the last three nucleotides of the exon

4 Discussion

This study was aimed at describing the conservationmuta-tion dynamics of the intron ends of the cytokine receptorgenes during the speciation processes to Mouse and Humanwhich began starting from a common ancestor 65ndash85 MYA[32 33] We selected 26 orthologous genes of the two speciesin which the MouseHuman topographic correspondence ofexons had been consistently preserved The selected genescoded for receptors belonging to different cytokine receptorgroups (Table 1)We firstly identified the nucleotide identitiesat the corresponding positions in Mouse and Human in thefirst 50 and last 50 nucleotides of each intron All intronsstudied (216 couples) started and ended with the canonicaldinucleotides GT and AG respectively conforming to theusual situation This preliminary analysis demonstrated thatin general the four nucleotides following GT at the 51015840ss andthe four nucleotides preceding AG at the 31015840ss were morehighly conserved as compared to the other nucleotides atboth ends of the introns (Figures 1-2) We thus undertooka more detailed analysis of the initial and terminal intronichexanucleotides

All the 16 hexanucleotides beginning by GT and endingby AG might act both as 51015840ss and 31015840ss In our sample 8 ofthese sequences appeared either at 51015840 or at 31015840 but never inboth It may thus be hypothesized that some mechanism ofdisambiguation may operate possibly related to other exonicor intronic ldquosignalsrdquo involved in the splicing processes

The structure of the hexanucleotides actually expressed atthe beginning and end of the introns is quite variable even ifsome configurations are more frequent (Tables 2 and 4) Inour sample the initial hexanucleotide appeared in 51 differentconfigurations while the terminal hexanucleotide appearedin 94 different configurations indicating a lower selectivepressure on the terminal hexanucleotideThis is confirmed bythe MouseHuman conservationmutation data the averagemutation per hexanucleotide was 058 in the initial hexanu-cleotides and 117 in the terminal hexanucleotides and thetotal hexanucleotide conservation was 574 for initial hex-anucleotides and only 282 for terminal hexanucleotides

The percentages of occurrence of the more expressedinitial and terminal hexanucleotides did not differ signifi-cantly inMouse and Human (Tables 2 and 4)The percentageof the individual nucleotides at each position in the initialand terminal hexanucleotides was also similar in Mouseand Human and our results are in general comparable to

those reported in the literature [25 27 28] In our samplein the sixth nucleotide of the initial hexanucleotides C wassignificantly more expressed in the Mouse than in Humanand in the first nucleotide of the terminal hexanucleotidesG was significantly more expressed in the Mouse than inHuman (Tables 3 and 5) In our sample the nucleotideG was never present in the fourth position of terminalhexanucleotides

From the data on the actual frequencies of the indi-vidual nucleotides at each position in the initial and ter-minal hexanucleotides in Mouse and Human separately wecalculated the probability of random conservation of anygiven nucleotide at a given position in both species Theresults are shown in Table 3 (initial hexanucleotides) andTable 5 (terminal hexanucleotides) together with the resultsobtained on the actual conservation of each nucleotide ata given position in both Mouse and Human As shown inthe Tables the actual conservation was always significantlyhigher than the random conservation On average the actualconservation exceeded by 53 the random conservationin the initial hexanucleotides and by 80 in the terminalhexanucleotides These ldquogainsrdquo express the conservation dueto a selective evolutionary pressure to keep the configurationof the common ancestor at the level of the individualnucleotides

We also analyzed the conservation of groups of the vari-able nucleotides in initial and terminal hexanucleotides Therandom conservation of groups of three or two nucleotideswas calculated and compared with the actual frequenciesAll the motifs of the initial hexanucleotides found to beassociated with a higher or a lower conservation are reportedin the Results section The most remarkably conservedtrinucleotide was xxGAGx while xxAAGx was significantlyunderexpressed Accordingly GTGAGT and GTGAGGwereoverexpressed while GTAAGG was underexpressed Since inthe third position A is more conserved than G by about70 GTGAGT and GTGAGG should be less expressedthan GTAAGT and GTAAGG respectively by a similarpercentage Actually GTGAGT and GTGAGG were moreexpressed than the similar sequences with A in the thirdposition so that in our sample GTGAGT was the mostfrequently expressed initial hexanucleotide and GTAAGTwas only the second most frequently occurring sequence(Table 2)

In the terminal hexanucleotides the sequences TTTxAGand xTGCAG were overexpressed and accordinglyTTTCAG TTTTAG and CTGCAG were more expressedthan the random expectance These three sequences arethe most frequently expressed terminal hexanucleotides(Table 4) No specific motif entailing terminal nucleotideunderexpression could be demonstrated

It should be remarked that the presence of overex-pressed or underexpressed trinucleotides seems to be acondition which although necessary is not sufficient perse to entail over- or underexpression of the correspondinghexanucleotides since the outcome also depends on the othervariable nucleotide

The present results suggest that the conservation of eachnucleotide of the initial and terminal hexanucleotides is

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

International Journal of Evolutionary Biology 9

dependent upon the conservationmutation of the othernucleotides and an evolutionary selective pressure oper-ates to keep certain global configurations derived from theMouseHuman common ancestor while other configurationstend to be less conserved In particular the initial con-figuration GTAAGT which corresponded to the maximalconservation of the individual nucleotides at positions 3 to6 (Table 3 column 6) and is the one which exactly base-pairswith U1 snRNA appeared to be negatively biased

In general the requirement for base-pairing of the initialhexanucleotide with U1 seems not to be stringent since inour sample complete matching was observed in 18 only ofthe 51015840sss and 51015840sss with as few as three matching nucleotidesbeing effective (13 of cases) In very few cases optimal base-pairing ofU1 could be achieved by partly binding the terminaltrinucleotide of the exon

Analysis of the transcript variants of the genes consideredin the present study revealed that no type of variant iscommon in Mouse and Human except in one single caseThis suggests that these variants were not inherited fromthe common ancestor but emerged only later during thespeciation process One frequent variant is the silencingof a constitutive 31015840ss which leads to the skipping of theimmediately downstream exon with transcription being thenresumed for the other downstream exons Another type ofvariant is the activation of an alternative 51015840ss or 31015840ss usuallylocated a few nucleotides upstream or downstream of thecanonical splicing site the latter being silenced This leadsto alterations of the nucleotide number of the neighboringexons but usually without frameshifts Most constitutive sssespecially at 51015840 are robust enough to be maintained steadilyactive but 104 of the 31015840sss and 32 of the 51015840sss were foundto be variable Our analysis could not reveal any specific trendtowards the silencing of constitutive sss or the activation ofalternative sss as certain hexanucleotides which are silencedat given sites are on the contrary activated at other sites toreplace constitutive sss as for example the 51015840sss GTGAGAand GTGGGA and the 31015840ss CTCCAG Possibly silencingor activation may depend in a complex way on the generalcontext [20]

To conclude analysis of the nucleotide conservation inthe 51015840ss and 31015840ss hexanucleotides of Mouse and Humanintrons reveals definite evolutionary positive biases towardsthe conservation of some specific configurations and neg-ative biases against other configurations However the 51015840sshexanucleotides and especially the 31015840ss hexanucleotides stillexhibit a wide structural variability confirming the con-tention that splicing depends on a complex code whichbesides the 51015840ss and 31015840ss possibly involves additional signalsfrom the neighboring exons or from sections of the intronother than the splice sites and might also be regulated by thesecondary and tertiary structures forming in the pre-mRNAmolecule [12ndash14 16ndash23]

Acknowledgments

The authors gratefully thank Doctor Mary Victoria CandacePragnell and Professor Vincenzo Mitolo for constructivecriticisms and reviewing the paper

References

[1] S Stamm S Ben-Ari I Rafalska et al ldquoFunction of alternativesplicingrdquo Gene vol 344 pp 1ndash20 2005

[2] A J Matlin F Clark and C W J Smith ldquoUnderstanding alte-rnative splicing towards a cellular coderdquoNature ReviewsMolec-ular Cell Biology vol 6 no 5 pp 386ndash398 2005

[3] M Hallegger M Llorian and C W J Smith ldquoAlternativesplicing global insights minireviewrdquo FEBS Journal vol 277 no4 pp 856ndash866 2010

[4] E Conti and E Izaurralde ldquoNonsense-mediated mRNA decaymolecular insights and mechanistic variations across speciesrdquoCurrent Opinion in Cell Biology vol 17 no 3 pp 316ndash325 2005

[5] F Lejeune and L E Maquat ldquoMechanistic links between non-sense-mediated mRNA decay and pre-mRNA splicing in mam-malian cellsrdquo Current Opinion in Cell Biology vol 17 no 3 pp309ndash315 2005

[6] J Weischenfeldt J Lykke-Andersen and B Porse ldquoMessengerRNA surveillance neutralizing natural nonsenserdquoCurrent Biol-ogy vol 15 no 14 pp R559ndashR562 2005

[7] E V Koonin ldquoThe origin of introns and their role in eukaryo-genesis a compromise solution to the introns-early versusintrons-late debaterdquoBiologyDirect vol 1 no 22 pp 1ndash23 2006

[8] J P Orengo and T A Cooper ldquoAlternative splicing in diseaserdquoAdvances in Experimental Medicine and Biology vol 623 pp212ndash223 2007

[9] A S Solis N Shariat and J G Patton ldquoSplicing fidelity enhan-cers and diseaserdquo Frontiers in Bioscience vol 13 no 5 pp 1926ndash1942 2008

[10] C A Pettigrew and M A Brown ldquoPre-mRNA splicing aberra-tions and cancerrdquo Frontiers in Bioscience vol 13 no 3 pp 1090ndash1105 2008

[11] M A Panaro V Mitolo A Cianciulli P Cavallo C I MitoloandAAcquafredda ldquoTheHIV-1 Rev binding family of proteinsthe dog proteins as a study modelrdquo Endocrine Metabolic andImmune Disorders vol 8 no 1 pp 30ndash46 2008

[12] M D Adams D Z Rudner and D C Rio ldquoBiochemistry andregulation of pre-mRNA splicingrdquo Current Opinion in CellBiology vol 8 no 3 pp 331ndash339 1996

[13] A R Krainer Eukaryotic mRNA Processing Oxford UniversityPress New York NY USA 1997

[14] C B Burge T Tuschl and P A Sharp Splicing of Precursors tomRNAs by the Spliceosome Cold Spring Laboratory HarborPress Cold Spring Harbor NY USA 1999

[15] B J Blencowe ldquoExonic splicing enhancers mechanism ofaction diversity and role in human genetic diseasesrdquo Trends inBiochemical Sciences vol 25 no 3 pp 106ndash110 2000

[16] X H Zhang and L A Chasin ldquoComputational definition ofsequence motifs governing constitutive exon splicingrdquo Genesand Development vol 18 no 11 pp 1241ndash1250 2004

[17] J Hui L Hung M Heiner et al ldquoIntronic CA-repeat and CA-rich elements a new class of regulators of mammalian alterna-tive splicingrdquoEMBO Journal vol 24 no 11 pp 1988ndash1998 2005

[18] J L Kabat S Barberan-Soler P McKenna H Clawson TFarrer and A M Zahler ldquoIntronic alternative splicing regula-tors identified by comparative genomics in nematodesrdquo PLoSComputational Biology vol 2 no 7 p e86 2006

[19] C Dominguez and F H-T Allain ldquoNMR structure of the threequasi RNA recognition motifs (qRRMs) of human hnRNP Fand interaction studies with Bcl-x G-tract RNA a novel modeof RNA recognitionrdquo Nucleic Acids Research vol 34 no 13 pp3634ndash3645 2006

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

10 International Journal of Evolutionary Biology

[20] L A Chasin ldquoSearching for splicing motifsrdquo Advances in Expe-rimental Medicine and Biology vol 623 pp 85ndash106 2007

[21] LHHungMHeiner J Hui S Schreiner V Benes andA Bin-dereif ldquoDiverse roles of hnRNP L in mammalian mRNA proc-essing a combined microarray and RNAi analysisrdquo RNA vol14 no 2 pp 284ndash296 2008

[22] M A Panaro A Cianciulli R Calvello et al ldquoAn analysis of thehuman chemokine CXC receptor 4 generdquo Immunopharmacol-ogy and Immunotoxicology vol 31 no 1 pp 88ndash93 2009

[23] J Zhang C C Kuo and L Chen ldquoGC content around splicesites affects splicing through pre-mRNA secondary structuresrdquoBMC Genomics vol 12 no 90 pp 1ndash11 2011

[24] TWNilsen ldquoThe spliceosome themost complexmacromolec-ularmachine in the cellrdquoBioEssays vol 25 no 12 pp 1147ndash11492003

[25] J F Abril R Castelo and R Guigo ldquoComparison of splice sitesin mammals and chickenrdquo Genome Research vol 15 no 1 pp111ndash119 2005

[26] N Sheth X Roca M L Hastings T Roeder A R Krainer andR Sachidanandam ldquoComprehensive splice-site analysis usingcomparative genomicsrdquo Nucleic Acids Research vol 34 no 14pp 3955ndash3967 2006

[27] S H Schwartz J Silva D Burstein T Pupko E Eyras and GAst ldquoLarge-scale comparative analysis of splicing signals andtheir corresponding splicing factors in eukaryotesrdquo GenomeResearch vol 18 no 1 pp 88ndash103 2008

[28] E V Kriventseva and M S Gelfand ldquoStatistical analysis of theexon-intron structure of higher and lower eukaryote genesrdquoJournal of Biomolecular Structure and Dynamics vol 17 no 2pp 281ndash288 1999

[29] M Burset I A Seledtsov and V V Solovyev ldquoSpliceDB dat-abase of canonical and non-canonical mammalian splice sitesrdquoNucleic Acids Research vol 29 no 1 pp 255ndash259 2001

[30] M Burset I A Seledtsov and V V Solovyev ldquoAnalysis ofcanonical and non-canonical splice sites in mammaliangenomesrdquoNucleic Acids Research vol 28 no 21 pp 4364ndash43752000

[31] V Shepelev and A Fedorov ldquoAdvances in the exon-intron dat-abase (EID)rdquo Briefings in Bioinformatics vol 7 no 2 pp 178ndash185 2006

[32] M Foote J P Hunter C M Janis and J J Sepkoski Jr ldquoEvolu-tionary and preservational constraints on origins of biologicgroups divergence times of eutherian mammalsrdquo Science vol283 no 5406 pp 1310ndash1314 1999

[33] M S Lee ldquoMolecular clock calibrations and metazoan diver-gence datesrdquo Journal of Molecular Evolution vol 49 no 3 pp385ndash391 1999

[34] K M Murphy P Travers and M Walport Janewayrsquos Immuno-biology Garland Science Oxford UK 7th edition 2007

[35] R Coico and G Sunshine Immunology A Short Course Wiley-Blackwell Hoboken NJ USA 6th edition 2008

[36] F Corpet ldquoMultiple sequence alignmentwith hierarchical clust-eringrdquo Nucleic Acids Research vol 16 no 22 pp 10881ndash108901988

[37] M L Samuels and J A Witmer Statistics for the Life SciencesPrentice Hall Upper Saddle River NJ USA 3rd edition 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Conservation/Mutation in the Splice Sites of Cytokine Receptor …downloads.hindawi.com/archive/2013/818954.pdf · 2019. 7. 31. · transforming growth factor beta

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology