Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated...
Transcript of Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated...
LUND UNIVERSITY
PO Box 117221 00 Lund+46 46-222 00 00
Human endogenous retrovirus family HERV-K(HML-5): Status, evolution, andreconstruction of an ancient betaretrovirus in the human genome
Lavie, Laurence; Medstrand, Patrik; Schempp, Werner; Meese, Eckart; Mayer, Jens
Published in:Journal of Virology
DOI:10.1128/JVI.78.16.8788-8798.2004
Published: 2004-01-01
Link to publication
Citation for published version (APA):Lavie, L., Medstrand, P., Schempp, W., Meese, E., & Mayer, J. (2004). Human endogenous retrovirus familyHERV-K(HML-5): Status, evolution, and reconstruction of an ancient betaretrovirus in the human genome.Journal of Virology, 78(16), 8788-8798. DOI: 10.1128/JVI.78.16.8788-8798.2004
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authorsand/or other copyright owners and it is a condition of accessing publications that users recognise and abide by thelegal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of privatestudy or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal
JOURNAL OF VIROLOGY, Aug. 2004, p. 8788–8798 Vol. 78, No. 160022-538X/04/$08.00�0 DOI: 10.1128/JVI.78.16.8788–8798.2004Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Human Endogenous Retrovirus Family HERV-K(HML-5): Status,Evolution, and Reconstruction of an Ancient Betaretrovirus
in the Human Genome†Laurence Lavie,1 Patrik Medstrand,2 Werner Schempp,3 Eckart Meese,1 and Jens Mayer1*Department of Human Genetics, University of Saarland, 66421 Homburg,1 and Institute of Human Genetics and
Anthropology, University of Freiburg, 79106 Freiburg,3 Germany, and Department of Cell andMolecular Biology, Lund University, 22184 Lund, Sweden2
Received 27 December 2003/Accepted 12 April 2004
The human genome harbors numerous distinct families of so-called human endogenous retroviruses(HERV) which are remnants of exogenous retroviruses that entered the germ line millions of years ago. Wedescribe here the hitherto little-characterized betaretrovirus HERV-K(HML-5) family (named HERVK22 inRepbase) in greater detail. Out of 139 proviruses, only a few loci represent full-length proviruses, and manylack gag protease and/or env gene regions. We generated a consensus sequence from multiple alignment of 62HML-5 loci that displays open reading frames for the four major retroviral proteins. Four HML-5 longterminal repeat (LTR) subfamilies were identified that are associated with monophyletic proviral bodies,implying different evolution of HML-5 LTRs and genes. Sequence analysis indicated that the proviruses formedapproximately 55 million years ago. Accordingly, HML-5 proviral sequences were detected in Old World andNew World primates but not in prosimians. No recent activity is associated with this HERV family. We alsoconclude that the HML-5 consensus sequence primer binding site is identical to methionine tRNA. Therefore,the family should be designated HERV-M. Our study provides important insights into the structure andevolution of the oldest betaretrovirus in the primate genome known to date.
Approximately 8% of the human genome is derived fromretrovirus-like elements termed endogenous retroviruses(ERV) (14). Most of them are likely remnants of exogenousretrovirus infection of the germ line which became fixed in thepopulation millions of years ago. Intracellular retrotransposi-tion events increased proviral copy number during evolution,resulting in the presence of a few to several thousand provi-ruses belonging to various ERV families. A variety of distincthuman ERV (HERV) families have been identified, suggestingthat the germ line was invaded by various exogenous retrovi-ruses (23, 25, 41). The human genome was recently estimatedto contain 30 to 50 distinct HERV families (11). About 100different HERV sequences are defined in Repbase, a widelyemployed reference sequence database for repetitive elements(15). Unfortunately, there is no established nomenclature forHERV. We follow here a nomenclature that uses the single-letter amino acid code of the tRNA complementary to theprimer binding site formerly used to prime reverse transcrip-tion. Accordingly, HERV-K denotes a number of HERV fam-ilies having a lysine tRNA-like primer binding site. Phyloge-netic analysis of HERV reverse transcriptase sequences haveidentified 10 HERV-K families in the human genome whichwere termed human MMTV-like (HML-1 to HML-10) be-cause of homologies to the betaretrovirus mouse mammary
tumor virus (MMTV) (1, 32). Repbase Update also lists 10HERV-K families.
Structurally intact ERV contain gag, protease (prt), polymer-ase (pol), and envelope (env) genes and are flanked by longterminal repeats (LTRs). Two HERV families, HERV-K(HML-2) and HERV-W, contain intact open reading frames(ORFs) and encode functional proteins (2, 3, 18, 27, 29, 31,36). Further proviral sequences with Env coding capacity wereidentified recently (9). Most HERV sequences are coding de-ficient because they accumulated a variety of mutations anddeletions since provirus formation. The vast majority of HERVare present in the genome only as solitary LTRs due to ho-mologous recombination between 5� and 3� LTRs.
Many HERV families were fixed in Old World primatesafter their evolutionary split from New World primates about35 million years ago (41). For example, the HERV-K familiesHML-2, HML-3, and HML-6 were all found in Old Worldmonkeys but missing in New World primates. The HERV-K(HML-2) family seems to be relatively long-lived in the ge-nome, displaying transpositional activity since its appearancein the primate lineage up till after the human-chimpanzee split(7, 28, 33, 38). In contrast, other ERV families are short-lived.In the case of the HERV-K(HML-3) family, no activity interms of provirus formation was indicated in the human evo-lutionary lineage over the last 30 million years (26). To date,only a few HERV families have been characterized in greaterdetail. Here we characterize the HERV-K(HML-5) family byanalyzing proviruses in the human genome. We find that theHML-5 proviruses have an older origin than other betaretro-viruses in the human genome, and we reconstruct a putativefull-length coding-competent ancient betaretrovirus which tar-geted the primate germ line more than 50 million years ago.
* Corresponding author. Mailing address: Department of HumanGenetics, Building 60, University of Saarland, Medical Faculty, 66421Homburg, Germany. Phone: 49 6841 1626627. Fax: 49 6841 1626186.E-mail: [email protected].
† Supplemental material for this article may be found at http://jvi.asm.org/.
8788
MATERIALS AND METHODS
HERV-K(HML-5) sequence retrieval. HERV-K(HML-5) proviruses(LTR22A/HERVK22/LTR22A; Repbase version 8.2.0 consensus sequences)were identified by downloading the human sequence specified as HERVK22 inthe RepeatMasker annotation of the June 2002 (hg12) UCSC genome browser(http://genome.ucsc.edu/). We performed dot matrix comparisons between eachproviral sequence using MacVector (Accelrys Inc.) with default settings. Weidentified boundaries between HERV-K(HML-5) proviral loci and flanking cel-lular sequences and the location of nonretroviral repetitive elements within theproviral portions by using RepeatMasker (provided by A. F. A. Smit and P.Green; RepeatMasker at http://www.repeatmasker.org). We subsequently re-moved LTRs and apparent non-HERV-K(HML-5) sequences.
Multiple alignment of HERV-K(HML-5) sequences and consensus sequencegeneration. We produced multiple alignments of HERV-K(HML-5) proviralsequences displaying a minimal size of 3 kb (after deletion of other repetitiveelements). Multiple alignments were generated employing DIALIGN2 (35) pro-vided by the Institut Pasteur web server (http://bioweb.pasteur.fr) or employingClustalW (42) using default settings. We further optimized alignments by hand-operating the Se-Al program (provided by Andrew Rambaut; http://evolve.zoo.ox.ac.uk/). Consensus sequences, generated by the Boxshade program at theInstitut Pasteur, were further evaluated and corrected manually. The resultingproviral sequence was analyzed for encoded retroviral proteins and conserveddomains employing BlastP and CD-search at the National Center for Biotech-nology Information.
Phylogenetic analysis. We employed PAUP* (Sinauer Associates) and thePHYLIP package (10), the latter provided by the Institut Pasteur web server, forphylogenetic analysis. The above programs generated multiple alignment of LTRsequences, and several regions from the multiple alignment of proviral bodyregions were subjected to neighbor-joining analysis. Nucleotide distances werecorrected according to the Kimura-2-parameter model (17). Gaps and ambigu-ous positions were excluded from analysis. Sequences obtained from variousprimate species were included in the respective multiple alignment portions ofhuman sequences and were analyzed similarly.
Evolutionary age of proviruses. Approximate integration times of HERV-K(HML-5) proviruses displaying almost full-length LTRs on both sides wereestimated by determination of Kimura-2-parameter corrected nucleotide dis-tances between 5� and 3� LTRs of particular proviruses by employing PAUP*.Proviral age (T) was estimated according to the following formula: T � D/(2 �
0.13), where D is the nucleotide divergence between the 2 LTR sequences and0.13 is the estimated average substitution rate per nucleotide and million years(8, 19). The factor 2 considers that both LTRs diverged independently from eachother.
Test for HERV-K(HML-5) homologous sequences in primate species. PCRprimers corresponding to the 5� portion of the gag gene (P1, 5�-AACAGTATATAAAAGTATTGAAACA-3�; and P2, 5�-TGCTTTTTCTTATCTCTTTATAAG-3�) and the env gene (P3, 5�-CCATTCACTCCAGATAATTTGT-3�; and P4,5�-TTCTTTTCCAATCAAGGAGTGA-3�) within the HERV-K(HML-5) con-sensus sequence were used to amplify HERV-K(HML-5) proviruses from humanand various other primate species genomic DNAs, together comprising fourprimate clades: Pan troglodytes, Pongo pygmaeus, Hylobates lar, Colobus guereza,Mandrillus sphinx, Macaca mulatta, Saimiri sciureus, Callithrix jacchus, Alouattaseniculus, and Nycticebus coucang. Genomic DNA (100 ng each) was subjected toPCR with 1 �M primers and 2.5 U of Taq polymerase (Invitrogen Inc.) in a finalvolume of 25 �l. PCR cycling conditions for gag and env amplification were asfollows: 5 min at 94°C; 35 cycles of 45 s at 94°C, 45 s at 54°C, and 2 min at 72°C;5 min at 72°C. PCR products for the gag region obtained from P. troglodytes, H.lar, M. mulatta, and A. seniculus were cloned into the pGEM-T vector (Promega).Sequences for both DNA strands were obtained using a SequiTherm Excel IIDNA sequencing kit-LC (Biozym) and an automated DNA sequencer (Licor4000-L; MWG). Raw sequence data were analyzed using the Sequencher soft-ware (Gene Codes Corp.).
Nucleotide sequence accession numbers. We deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase.
RESULTS
Identification and structure of HERV-K(HML-5) provi-ruses. By using the annotation within the UCSC genomebrowser (http://genome.ucsc.edu/), we identified 139 HML-5proviruses in the human genome. Twenty-three out of 139(16.5%) loci were located on the Y chromosome, whereas
other loci seemed randomly distributed along autosomes. Wenext performed dot matrix comparisons between proviral lociand the Repbase consensus sequences (LTR22A/HERVK22/LTR22A) to examine proviral structures in more detail. Wegenerated schematic structures of 100 out of 139 HML-5 se-quences (Fig. 1). We did not include 39 of the provirusesbecause those proviral sequences were heavily mutated due tointegration of other repetitive elements, deletions, duplica-tions, and/or inversions. The majority (123 proviruses) dis-played larger deletions of retroviral genes: only 11 provirusesdisplayed a complete gag gene, 12 a complete prt gene, 45 acomplete pol gene, and 15 a complete env gene. Among those,multiple sequences commonly displayed large deletions ofabout 1,410 and 1,320 bp within either the gag-prt junction orthe env region, respectively, or in both. The first deletion re-moved a large portion of the 5� part of the gag gene and thecomplete prt gene, and the second deletion removed a largecentral part of the env gene (Fig. 1). Sixteen proviruses wereintact regarding the presence of retroviral genes, and nine ofthese resembled full-length proviral structures having LTRs onboth sides flanking the putative coding regions.
Reconstruction of a coding-competent HML-5 provirus. Be-fore multiple sequence alignment of HML-5 sequences, weemployed RepeatMasker to identify and to subsequently de-lete non-HML-5 sequences from proviral sequence entries. Agreater number of proviruses contained one or several otherrepetitive elements which likely integrated into the provirusesat different stages during primate evolution. L1 elements fromthe M1, P, and PA2-6 subfamilies and Alu elements from theY, Yc, Ya5, Yb9, Sp, Sc, and Sg subfamilies were recognized(4, 16, 40). Furthermore, numerous LTR elements annotatedin Repbase as LTR5B, -6A, -7, -7B, -10F, -12, -12B, -12C, -15,-17, -19, -30, and MER11C (15) were found (Table 1). Wechose 62 proviral sequence entries with the least deletions formultiple alignment. By using Boxshade and subsequent visualcorrection, we generated a HML-5 consensus sequence fromthe alignment. The sequence displayed a total length of 6,824bp and was 4% divergent in sequence compared to theHERVK22 consensus sequence present in Repbase. Severalnucleotide differences adjusted reading frames with regard tothe Repbase sequence. Two 1-bp and an 11-bp insertion wereintroduced into the 5� intergenic region. A 3-bp sequence wasintroduced, and 2 and 1 bp were removed in the gag, prt, andpol gene regions, respectively. In addition, a stop codon withinthe prt gene and three stop codons within pol could be cor-rected.
Translation of our consensus sequence gave rise to ORFs forfour major retroviral proteins that were not found as such inthe Repbase sequence. ORFs for putative retroviral gag, prt,pol, and env genes could be defined (Fig. 1; see also Fig. 1A inthe supplemental material, which is also available from usupon request). The putative 517-, 265-, 911-, and 694-amino-acid Gag, Prt, Pol, and Env proteins, respectively, displayedextended similarities to the corresponding retroviral proteins,as revealed by CD-search and BlastP analysis. As describedrecently, HML-5 proviruses once harbored a potentially activedUTPase enzyme within the prt ORF and also a domain en-coding a retroviral aspartyl protease (30). As expected, the polreading frame encoded retroviral reverse transcriptase, RNaseH, and integrase domains. CD-search furthermore identified a
VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8789
FIG. 1. Structures of 100 HERV-K(HML-5) provirus sequences. An ORF map of the HERV-K(HML-5) consensus sequence generated in thisstudy, including retroviral gag, prt, pol, and env reading frames and protein domains, is shown at the top. Structures of proviral sequences withregard to the consensus sequence are depicted below, with horizontal lines indicating the presence of respective proviral portions. Numbers to theright refer to proviral locus numbers given in Table 1. Proviruses 117, 133, and 136 probably stem from genomic duplication events and are
8790 LAVIE ET AL. J. VIROL.
HERV-K/MMTV-type gp36 domain within the Env proteinC-terminal portion (Fig. 1). BlastP analysis of encoded retro-viral proteins against the GenBank protein division revealedhigh similarities to retroviral proteins from the HERV-K(HML-2) family. Moreover, significant similarities to the be-taretroviruses ovine pulmonary adenocarcinoma virus,MMTV, Mason-Pfizer monkey virus, and simian retroviruses 1and 2 were detected, with identities to respective retroviralproteins ranging from 26 to 49% and similarities ranging from41 to 64% (as determined by BLAST). Thus, our method wasable to reconstruct all putative coding sequences of an ancientbetaretrovirus.
The HML-5 sequence was initially assigned to the HERV-Ksuperfamily due to the similarity in pol with other HERV-Kfamilies. Based on the analysis of an earlier draft version of thehuman genome sequence, Tristem suggested that the HERV-K(HML-5) primer binding site region is more similar to iso-leucine than to lysine (43). We analyzed the HERV-K(HML-5) consensus primer binding site region in regard tothe similarity to reported human tRNA sequences (22) (http://rna.wustl.edu/GtRDB/) and found that the HML-5 primerbinding site is identical in sequence to the methionine tRNA 3�end along at least 18 nucleotides (nt). Isoleucine tRNA dis-played differences in three nucleotides, and lysine tRNAs dis-played at least six different nucleotides (Fig. 2). Therefore,following the current HERV nomenclature, the HERV-K(HML-5) family should be designated HERV-M.
Phylogeny of the HERV-K(HML-5) family. Dot matrix com-parisons between the different proviruses and the HERVK22sequence flanked by LTR22A, as included in Repbase, showedthat only a few entries presented more extended similaritieswith the LTR22A sequence. This finding suggested the exis-tence of one or more LTR variants associated with HML-5sequences and corroborated the definition of different LTR22subfamilies in Repbase (LTR22, LTR22A, and LTR22B). Foreach proviral entry, we utilized RepeatMasker annotations todetermine the presence of complete or deleted 5� and 3� LTRs(Table 2). We found that 70 proviral sequences were associ-ated with full-length LTRs on both sides, 59 entries harboredat least one complete or deleted LTR on one side, and 10entries lacked both LTRs. We generated a ClustalW multiplealignment with default alignment parameters for a total of 134fairly complete LTR sequences.
Neighbor-joining analysis of the sequences subject to multi-ple alignment suggested the existence of several LTR subfam-ilies (Fig. 3). A major branch, displaying 100% bootstrap sup-port (1,000 replicates), consisted of sequences similar to theLTR22A subfamily. Among those sequences, two subgroupsdisplaying 63 and 91% bootstrap support could be distin-guished. We named these two subgroups LTR22A andLTR22A2. Another group (99% support) consisted of se-quences with similarity to the LTR22 subfamily. The majority
of the remaining sequences were more similar to the LTR22Bsubfamily. Therefore, analysis of the entire HML-5 data setrevealed three major LTR subfamilies (LTR22, LTR22A, andLTR22B) associated with the proviral sequences and is inaccord with previous findings. In addition, LTR22A comprisesa clearly distinguishable subgroup, named LTR22A2.
Based on the larger data set available in our study (40), wegenerated three new consensus sequences for the LTR22 fam-ilies listed in Repbase and for LTR22A. The classification ofLTR sequences in Table 2 is based on their phylogenetic re-lationship to those new LTR22 consensus sequences. The newconsensus sequences for LTR22B and LTR22A displayedlengths of 497 and 457 bp, respectively, and were modestlydifferent from the Repbase sequences. The 471-nt-longLTR22A2 consensus sequence presented 74% identity and 4%gaps compared to LTR22A. As revealed in this study, thepreviously established Repbase LTR22 consensus sequencewas probably derived from three loci located on the Y chro-mosome (proviruses 117, 121, 131; Fig. 3) that are less repre-sentative for LTR22. The LTR22 consensus sequence gener-ated in this study is 492 bp in length compared to the 580-ntRepbase sequence that furthermore included 47 ambiguouspositions (International Union of Pure and Applied Chemistrycodes).
We furthermore analyzed the phylogenetic relationships be-tween HERV-K(HML-5) proviruses for five different proviralregions (representing all major retroviral genes) by neighbor-joining and bootstrap analysis. Phylogenetic trees displayedsimilar branch lengths with respect to the consensus sequence,suggesting a similar nucleotide divergence and age of proviralsequences (data not shown). Neighbor-joining analysis impliedtwo HML-5 subfamilies, but they were not supported in thebootstrap analysis. Only a few chromosome Y sequences weredistinguished from the remaining sequences with higher boot-strap support (Fig. 4). An exception was obtained for theregion spanning the end of reverse transcriptase until the startof RNase H. In this case, a subfamily of 28 proviral sequenceswas distinguished with 86% bootstrap support that was mostlyassociated with LTR22A and LTR22A2 sequences. Eight moreLTR22B sequences were separated with 90% support. Theremaining 14 sequences (with low bootstrap support) wereeither LTR22 or LTR22B (data not shown). Taken together,analysis of four out of five internal regions supports the obser-vation that HML-5 proviral bodies do not form distinct sub-groups but rather represent a monophyletic group, in contrastto phylogenetically clearly different LTR sequences.
Age of the HERV-K(HML-5) family. A number of repetitiveelements were found in HERV-K(HML-5) proviruses. In par-ticular, reasonably old Alu subfamilies, such as AluSg, AluSc,and AluSp (Table 1), with approximate evolutionary ages of 31,35, 37 million years (16), respectively, suggested an evolution-ary age of HML-5 proviruses higher than that of other
therefore summarized in one line. Non-HERV-K(HML-5) insertions, such as Alu or L1 elements, present in some proviruses were excluded fromthe figure. Note that the majority of sequences display two commonly deleted regions within the gag-prt and the env genes, indicated by verticaldotted lines. Some sequences only comprise 5� or 3� portions. For gag-prt, 5� and 3� deletion points clustered between nt 768 to 960 and nt 2061to 2319, respectively (52% being located at nt 768 and nt 2303). For env, clusters ranged from nt 5256 to 6328 and nt 6568 to 6641 (75% locatedat nt 5314 and 68% at nt 6641). The thick black lines numbered 1 through 5 indicate regions in the multiple sequence alignment for whichphylogenetic analysis was performed. RT, reverse transcriptase.
VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8791
TA
BL
E1.
HE
RV
-K(H
ML
-5)
loci
inve
stig
ated
inth
isst
udya
No.
Chr
.ban
dSi
ze(b
p)A
cces
sion
no.
Rep
eats
No.
Chr
.ban
dSi
ze(b
p)A
cces
sion
no.
Rep
eats
11p
35.2
829
AL
6629
06.4
(104
050–
1048
80)�
72*
4q32
.16,
276
AC
1070
56.5
(822
0–14
493)
�2*
1p22
.23,
721
AC
0191
87.3
(101
11–1
3833
)�73
*5p
14.3
3,53
7A
C02
6722
.4(9
6454
–999
85)�
3*1q
225,
409
AL
1359
27.1
4(71
991–
7740
1)�
Alu
Sp/A
luY
a574
5p14
.166
2A
C02
7333
.5(1
6879
0–16
9434
)�4
1q23
.112
,084
AL
3597
53.9
(444
27–5
6512
)�L
1PA
2/L
1PA
375
*5p
13.2
7,87
7A
C00
8925
.4(5
0885
–587
63)�
Alu
Y5*
1q25
.35,
167
AL
1333
83.1
0(88
786–
9395
4)�
76*
5q33
.12,
096
AC
0221
06.5
(509
79–5
3076
)�6
1q31
.365
1A
L59
2144
.5(1
4818
7–14
8839
)�77
*6p
22.3
7,74
2A
L59
1416
.4(9
7763
–105
506)
�7*
1q32
.12,
719
AL
3598
37.2
1(11
2252
–114
972)
�78
*6p
22.2
3,24
2A
L03
1777
.5(6
2600
–658
43)�
8*1q
32.1
2,71
8A
C02
2000
.1(5
7476
–601
95)�
79*
6p21
.32
7,28
7A
L64
5931
.7(5
7633
–644
54)�
9*1q
42.1
27,
611
AL
1627
38,1
2(44
11–1
2023
)�L
TR
12/L
TR
30/L
TR
5880
6p21
.32
136
AL
6459
31.7
(644
51–6
4588
)�10
*1q
435,
604
AL
5916
86.9
(118
293–
1238
98)�
Alu
Y81
*6p
21.3
11,
922
AL
1387
21.1
6(57
057–
5897
2)�
Alu
Sg(i
n5�
LT
R)
1110
q11.
215,
689
AC
0687
07.6
(109
183–
1148
73)�
Alu
Sp82
*6p
12.3
4,53
1A
L35
9458
.17(
6657
5–71
107)
�12
11q1
2.1
5,13
4A
P001
652.
4(15
6224
–161
359)
�83
*6p
12.1
4,88
9A
L45
0489
.12(
4528
4–50
174)
�13
*11
q14.
17,
752
AP0
0339
8.2(
1538
58–1
5682
9)�
84*
6q12
5,25
1A
L07
8597
.11(
4651
3–51
765)
�A
luY
(in
LT
R)
14*
11q2
16,
698
AP0
0087
0.4(
6549
2–72
191)
�A
luY
85*
6q14
.17,
151
AL
3918
43.1
3(67
3–78
25)�
15*
11q2
2.1
4,87
8A
P003
403.
2(16
4665
–169
544)
�86
*6q
14.3
4,49
4A
L35
7272
.10(
1482
9–19
324)
�16
11q2
2.1
771
AP0
0079
8.4(
4473
3–45
495)
�87
*6q
16.3
7,83
0A
L16
1720
.15(
1203
7–19
868)
�A
luY
1711
q22.
160
3A
P000
798.
4(45
495–
4608
1)�
88*
7p14
.14,
097
AC
0730
68.8
(158
047–
1621
45)�
18*
12p1
3.31
4,02
6A
C09
2746
.9(1
6238
–202
58)�
897p
14.1
640
AC
7206
1.8(
1598
4–16
579)
�19
12p1
3.31
242
AC
0927
46.9
(130
74–1
3317
)�90
*7q
21.1
11,
619
AC
0965
62.1
(411
74–4
2804
)�20
*12
q12
4,95
7A
C07
9601
.22(
7991
6–84
874)
�91
7q22
.35,
307
AC
0048
55.1
(661
47–7
1455
)�21
*12
q12
4,88
7A
C07
6972
.16(
2999
3–34
881)
�92
*7q
31.3
17,
753
AC
0045
36.1
(332
87–4
1051
)�22
12q1
54,
267
AC
0206
56.3
0(52
510–
5677
8)�
LT
R7B
(in
5�L
TR
)93
*8p
23.2
5,11
3A
C08
7369
.5(9
0669
–957
83)�
23*
12q2
1.32
4,96
6A
C08
7865
.8(1
5116
9–15
6136
)�94
*8p
23.2
5,04
3A
C08
7369
.5(3
6733
–417
75)�
2412
q22
5,22
0A
C01
8475
.27(
1655
38–1
7075
9)�
LT
R30
/LT
R12
/LT
R12
895
*8P
23.2
5,81
9A
C08
7369
.5(1
4280
–201
00)�
25*
12q2
3.1
6,09
1A
C01
0203
.13(
5308
7–59
179)
�96
*8p
23.1
6,27
2A
C05
5869
.4(2
3969
–271
26)�
Alu
Y/L
TR
7/H
ER
V-H
26*
12q2
4.31
3,93
0A
C00
5858
.1(7
141–
1107
2)�
97*
8p21
.21,
480
AC
0249
58.8
(167
718–
1691
99)�
27*
13q1
2.12
2,59
4A
L35
4798
.13(
8742
1–90
016)
�98
*8q
11.1
5,21
3A
C10
4576
.5(1
2676
8–13
1982
)�A
luY
28*
13q2
1.33
5,03
4A
L13
9001
.14(
8067
3–85
708)
�99
*8q
12.1
5,35
5A
C09
1561
.4(1
8044
1–18
5797
)�29
*13
q33.
14,
742
AL
1580
63.1
2(11
0923
–115
660)
�10
0*8q
21.3
3,72
0A
C11
0012
.3(1
3409
5–13
7806
)�30
*14
q21.
12,
851
AL
3568
00.3
(992
79–1
0213
1)�
101*
9p24
.12,
439
AL
1335
47.1
6(38
846–
4128
6)�
31*
14q2
1.2
3,68
0A
L16
1752
.4(8
5483
–891
64)�
102*
9p23
6,21
9A
L45
1129
.6(1
6932
–231
52)�
32*
14q3
1.1
8,65
4A
C01
8513
.5(1
9822
–284
77)�
L1P
A6/
LT
R12
B/L
TR
30/L
TR
1210
3*9q
31.1
5,18
7A
L35
3805
.20(
3011
6–35
304)
�A
luY
3314
q32.
335,
131
AB
0194
37.1
(130
399–
1355
31)�
104*
Xp2
1.3
7,30
4A
C02
4024
.6(1
0252
9–10
9834
)�L
1PA
534
15q1
5.1
4,34
5A
C08
7878
.7(9
2196
–965
42)�
Alu
Sg10
5*X
p21.
31,
130
AC
0052
97.1
(121
542–
1226
63)�
35*
15q2
1.2
6,20
6A
C02
5040
.7(6
4493
–652
86)�
L1P
A4
106*
Xp1
1.1
2,57
6A
L35
4756
.18(
1424
8–16
813)
�L
TR
1536
*15
q21.
34,
766
AC
0687
23.5
(705
64–7
5331
)�10
7*X
q11.
26,
619
AL
1582
03.1
2(31
906–
3852
6)�
37*
16q1
1.2
4,82
8A
C00
2519
.1(6
3018
–678
47)�
L1M
110
8*X
q11.
25,
610
AL
3537
44.1
8(11
7924
–123
535)
�A
luY
b938
*16
q11.
21,
962
AC
1165
53.1
(784
27–8
0390
)�10
9X
q12
2,64
3A
L44
5523
.11(
6858
9–71
233)
�39
16q1
1.2
670
AC
1067
85.2
(451
30–4
5801
)�11
0*X
q13.
27,
328
AL
3565
13.1
1(68
589–
7123
3)�
aluY
b940
16q1
1.2
6,77
6A
C10
6785
.2(4
7835
–546
12)�
Alu
Y/L
TR
1911
1*X
q21.
14,
840
AL
5921
71.8
(182
87–2
3126
)�41
*18
q12.
14,
077
AC
0742
37.4
(266
72–3
0750
)�11
2*X
q21.
13,
222
AL
5925
63.7
(163
40–1
9563
)�42
18q2
1.2
687
AC
0911
35.9
(622
09–6
2897
)�11
3X
q21.
328,
115
AL
3908
40.1
7(18
9393
–197
509)
�L
1PA
5/L
1P(i
n5�
LT
R)
4318
q21.
22,
626
AC
0911
35.9
(629
17–6
5544
)�A
luY
/Alu
Sp44
*19
p13.
115,
715
AC
1239
12.1
(768
24–8
2540
)�A
luY
114*
Xq2
2.1
6,47
3A
L13
3277
.12(
1570
–804
4)�
45*
19p1
22,
998
AC
0114
93.4
(814
74–8
4473
)�11
5*X
q27.
33,
446
AL
5896
69.1
1(15
7039
–160
486)
�46
19p1
24,
950
AC
0114
93.4
(613
2–62
291)
�A
luSC
/Alu
Y/L
TR
7/H
ER
V-H
116*
Xq2
85,
541
U69
569.
1(18
47–7
389)
�47
19p1
25,
495
AC
0115
03.4
(276
93–3
3189
)�A
luSq
117*
Yp1
1.2
9,51
5A
C00
7274
.3(2
4960
–344
72)�
LT
R6A
/LT
R12
C48
*19
p12
3,59
4A
C01
1503
.4(4
7638
–512
33)�
Alu
SP/M
ER
911
8*Y
p11.
24,
211
AC
0072
75.4
(152
708–
1569
20)�
LT
R17
4919
q11
5,62
5A
C00
6504
.1(3
9162
–447
88)�
Alu
Yc
119*
Yp1
1.2
3,05
1A
C01
6749
.4(1
3125
5–13
4304
)�A
luY
50*
19q1
3.2
2,58
3A
C00
5337
.1(9
746–
1232
5)�
Alu
Y/L
1PA
512
0Y
p11.
28,
661
AC
0099
52.4
(751
35–8
3798
)�A
luY
/ME
R11
C51
2p23
.32,
713
AC
0108
96.1
5(27
276–
2999
0)�
LT
R17
/Alu
Y/A
luY
121*
Yp1
1.2
7,32
1A
C00
6986
.3(1
3826
0–14
5567
)�A
luY
/ME
R11
C52
*2p
16.1
4,93
1A
C01
0738
.5(7
2193
–771
25)�
ME
R2
122*
Yq1
1.22
15,
070
AC
0063
83.2
(688
83–7
3954
)�53
*2p
13.2
7,71
8A
C00
7878
.2(1
8354
9–18
5134
)�12
3Y
q11.
222
5,12
0A
C00
9233
.3(1
7211
–223
32)�
54*
2q21
.16,
694
AC
0682
82.2
(509
64–5
7659
)�A
luSc
124*
Yq1
1.22
24,
573
AC
0092
33.3
(108
140–
1127
14)�
Alu
Y/L
TR
1255
2q24
.35,
920
AC
0925
83.2
(123
236–
1291
57)�
LT
R12
C12
5*Y
q11.
223
6,11
9A
C00
7678
.3(8
8618
–947
34)�
Alu
Sc/A
luSp
/q56
*2q
31.1
7,76
7A
C09
2641
.2(5
721–
1348
9)�
126*
Yq1
1.22
35,
261
AC
0094
94.2
(123
311–
1285
73)�
Alu
Sc(i
n3�
LT
R)
8792 LAVIE ET AL. J. VIROL.
HERV-K families. The age of a provirus can be estimatedfrom sequence comparison of flanking 5� and 3� LTRs. Owingto the retroviral reverse transcription strategy, both LTR se-quences are identical in sequence at the time of provirus for-mation. Without selective pressure, both LTRs independentlyaccumulate mutations over time. Thus, sequence differencesbetween a provirus’ LTRs are an approximate measure ofprovirus age (8). We determined the degree of sequence di-vergence between 5� and 3� LTRs with larger overlappingportions for 53 HML-5 proviruses. We obtained sequence di-vergences ranging from 6 to 24% (mean, 12.9%; standarddeviation, 3.87). These numbers equal an approximate age of50 (�15) million years for the HML-5 proviruses (Table 2).We furthermore calculated the average ages of proviral se-quences from Kimura-2-parameter corrected distances to theHML-5 consensus sequence for five different proviral regions,excluding gaps and CpG dinucleotides (16). Here, an averageevolutionary age of about 60 (�27) million years was indicated.The age of roughly 55 million years from both analyses thuscorresponds to a HML-5 integration time into the genomeclearly before the evolutionary split of Old World from NewWorld monkeys that took place about 40 million years ago.This observation is in contrast to other HERV-K families de-scribed till now because those are not present in New Worldmonkeys, suggesting that HML-5 represents an ancient beta-retrovirus family in primates. To investigate the possibility thatHML-5 elements are present in New World primate genomes,we examined the species distribution of HML-5 gag and envregions by PCR with genomic DNA from hominoids, OldWorld monkeys, New World monkeys, and prosimians. Theamplified portion in the gag gene was located outside theabove-described commonly deleted region, whereas the ampli-con in env included the frequently deleted region. We obtainedgag and env PCR products of expected sizes from all speciestested, except for prosimians (Fig. 5). This result was in goodagreement with the above-mentioned age estimate from LTRand consensus sequence analysis. Full-length env genes,present in a minority of HML-5 proviruses in the human ge-nome, were amplified as a PCR product from all species. PCRproducts corresponding to the env deletion variant were alsoamplified, indicating the presence of both longer and shorterenv variants in all tested species.
We cloned PCR products from the gag region obtained fromP. troglodytes, H. lar, M. mulatta, and A. seniculus and se-
FIG. 2. Sequence comparison of the HERV-K(HML-5) consensussequence primer binding site with reported tRNA 3� ends (http://rna.wustl.edu/GtRDB). Shown here are the HERV-K(HML-5) con-sensus primer binding site regions from Repbase (Rep) and as gener-ated in this study (own) and the closest matching methionine, isoleu-cine, lysine, and arginine tRNA 3� ends. Anticodon sequences aregiven in brackets.
572q
31.1
6,67
2A
C07
8883
.6(1
3118
8–13
7864
)�12
7*Y
q11.
223
7,72
0A
C00
9489
.3(8
1570
–892
91)�
58*
2q37
.15,
575
AC
0094
07.8
(766
29–8
2201
)�L
TR
10F
/HE
RV
IP10
FH
/LT
R10
F12
8*Y
q11.
223
5,42
0A
C00
7876
.2(1
0081
3–10
6233
4)�
59*
20p1
35,
040
AL
0497
61.1
1(10
942–
1598
3)�
129
Yq1
1.22
312
,395
AC
0092
39.3
(231
04–3
5500
)�L
TR
12B
/LT
R12
C/L
TR
12/H
ER
V9
6020
p11.
2165
4A
L03
1673
.19(
3250
5–33
148)
�H
ER
VIP
10F
H/A
luY
(in
3�L
TR
)61
*3q
26.1
4,71
6A
C01
8457
.14(
4424
7–48
964)
�13
0*Y
q11.
223
8,66
1A
C02
1107
.3(7
1095
–797
56)�
Alu
Y/M
ER
11C
62*
3q26
.21,
878
AC
0080
40.7
(241
80–2
6059
)�13
1*Y
q11.
223
7,33
6A
C02
5227
.6(4
6631
–539
68)�
63*
4p13
1,44
6A
C09
6586
.3(2
1933
–233
80)�
Alu
Y13
2*Y
q11.
223
1,39
7A
C00
7320
.3(1
1024
4–11
1627
)�64
*4q
13.1
2,56
9A
C09
7648
.2(1
1974
1–12
2311
)�13
3*Y
q11.
223
9,51
5A
C01
0080
.2(2
8375
–378
88)�
LT
R6A
/LT
R12
C65
*4q
13.3
1,28
8A
C02
4722
.5(1
1136
2–11
2651
)�13
4Y
q11.
223
12,8
88A
C00
6366
.3(7
4013
–869
02)�
HE
RV
IP10
FH
/Alu
Ya5
/LT
R12
C66
*4q
13.3
4,97
6A
C02
4722
.5(1
1265
3–11
7630
)�L
TR
30/L
TR
12/L
1PA
367
*4q
21.2
33,
800
AC
0974
88.2
(137
13–1
7514
)�13
5Y
q11.
223
206
AC
0108
8.3(
205–
412)
�68
*4q
244,
831
AC
1073
81.2
(383
46–4
3178
)�13
6*Y
q11.
223
9,51
5A
C00
9947
.2(4
7848
–573
61)�
LT
R6A
/LT
R12
C69
*4q
28.3
6,62
7A
C09
6763
.2(1
9403
–260
31)�
137
Yq1
1.22
388
6A
C01
0153
.3(2
2418
–233
05)�
704q
28.3
6,63
5A
C10
8867
.3(1
0921
5–11
5851
)�L
TR
12C
/LT
R22
138
Yq1
1.23
886
AC
0167
28.4
(821
88–8
3075
)�71
*4q
32.1
4,13
7A
C00
9567
.8(4
7850
–519
88)�
139*
Yq1
1.23
9,51
5A
C00
6991
.3(1
0765
1–11
7164
)�L
TR
64/L
TR
12C
aH
ER
V-K
(HM
L-5
)lo
ciar
enu
mbe
red
cons
ecut
ivel
y.C
hrom
osom
al(C
hr.)
band
loca
tions
asw
ella
ssi
zes
ofpr
ovir
uses
orre
mna
nts
oflo
ciar
egi
ven.
Acc
essi
onnu
mbe
rsof
finis
hed
Gen
bank
hum
ange
nom
ese
quen
ceen
trie
s,in
clud
ing
nucl
eotid
epo
sitio
nw
ithin
the
entr
yan
dor
ient
atio
n(�
or�
)of
the
HE
RV
-K(H
ML
-5)
port
ion
are
give
nin
the
four
thco
lum
n.T
hela
stco
lum
nlis
tsno
n-H
ER
V-K
(HM
L-5
)re
petit
ive
elem
ents
with
inpr
ovir
alse
quen
ces.
VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8793
TA
BL
E2.
Ass
ignm
ent
ofL
TR
sequ
ence
sas
soci
ated
with
HE
RV
-K(H
ML
-5)
prov
iral
bodi
esa
No.
Ann
otat
ion
Age
No.
Ann
otat
ion
Age
No.
Ann
otat
ion
Age
Rep
base
Our
sR
epba
seO
urs
Rep
base
Our
s
5�L
TR
3�L
TR
5�L
TR
3�L
TR
5�L
TR
3�L
TR
5�L
TR
3�L
TR
5�L
TR
3�L
TR
5�L
TR
3�L
TR
122
Ano
22A
4822
A(�
)22
A22
A95
2222
B22
B2
22A
22A
22A
22A
50.8
4922
22B
22B
9622
2222
A2
22A
23
2222
(�)
5022
no97
22A
22A
(�)
22A
422
2251
22A
no22
A98
2222
2222
59.1
522
22(�
)52
2222
A(�
)22
A2
9922
no6
no22
5322
A22
A22
A22
A38
.410
0no
no7
22A
22A
22A
22A
5422
2222
2247
.910
122
B(�
)22
B22
B8
22A
22A
22A
22A
5522
2210
222
22am
b.am
b.50
.39
2222
5622
2222
2236
.110
322
22am
b.am
b.36
.710
2222
B22
B57
22B
22B
22B
22B
35.4
104
22A
no22
A11
22B
22B
(�)
22B
5822
2222
A2
105
22no
1222
2259
2222
22A
222
A2
47.7
106
no22
B22
B13
2222
22A
222
A2
40.6
60no
no10
722
B22
B22
B22
B51
.814
22no
2261
no22
22A
210
822
A22
A22
A22
A48
.615
2222
6222
A22
A22
A22
A30
.410
922
no16
22A
(�)
no63
22no
110
2222
2222
47.5
1722
A(�
)no
6422
A22
A22
A22
A52
.511
122
A(�
)22
A22
A18
nono
6522
A22
A(�
)11
2no
2222
B19
nono
6622
A22
A22
A22
A11
322
2220
22A
22A
22A
22A
38.5
6722
22(�
)11
422
22am
b.am
b.38
.721
22A
22A
22A
222
A2
38.2
6822
2222
2268
.611
522
A22
A22
A22
A34
.722
2222
(�)
6922
(�)
2222
A2
116
no22
2223
2222
22A
222
A2
45.3
7022
(�)
2211
7no
2222
B24
22B
22B
22B
22B
32.3
7122
2211
822
/22A
22/2
2A25
2222
7222
2222
2211
9no
no26
22A
22A
22A
22A
54.5
73no
22A
22A
120
22B
22B
22B
22B
86.7
2722
A22
A(�
)74
22A
no22
A12
1no
2222
B28
2222
22A
22A
3875
2222
122
22A
22A
22A
22A
2922
2222
A2
22A
42.2
7622
B22
B22
B22
B43
123
22B
22B
22B
22B
77.9
30no
no77
2222
2222
25.2
124
22(�
)no
3122
2278
22A
(�)
22A
22A
22A
125
22A
no22
A32
22B
22B
22B
22B
37.5
79no
no12
622
22am
b.33
22B
(�)
22B
22B
80no
no12
722
B22
B22
B22
B92
.734
no22
8122
Ano
128
no22
amb.
3522
A22
A(�
)22
A82
22A
22(�
)22
A12
922
22B
3622
22am
b.am
b.59
.783
2222
2222
130
22B
22B
3722
no22
A2
8422
2213
1no
2222
B38
2222
85no
2222
A2
132
nono
39no
22B
22B
8622
Ano
22A
133
no22
4022
B22
B22
B22
B54
.487
2222
2222
56.1
134
2222
4122
22(�
)22
A2
22A
256
.688
22A
22A
22A
22A
41.1
135
nono
42no
no89
22no
136
no22
4322
Ano
22A
90no
22B
137
22no
4422
Ano
22A
9122
B22
22B
138
22no
4522
A22
A22
A22
A68
9222
2222
2252
.813
9no
2246
2222
9322
A22
A22
A22
A52
.647
22A
22A
22A
9422
22A
22A
22A
70.6
aN
umbe
rsin
the
first
colu
mn
corr
espo
ndto
thos
ein
Tab
le1.
Ann
otat
ions
of5�
and
3�L
TR
sac
cord
ing
toR
epea
tMas
ker
and
toou
row
nan
alys
isar
ein
dica
ted.
Diff
eren
ces
betw
een
Rep
eatM
aske
ran
dou
row
nan
nota
tions
are
indi
cate
din
bold
.no,
the
corr
espo
ndin
gL
TR
was
mis
sing
;(�
),L
TR
sha
ving
reve
rse
com
plem
ent
orie
ntat
ions
asan
nota
ted
byR
epea
tMas
ker.
Our
own
anno
tatio
nsin
clud
eL
TR
sfo
rw
hich
subf
amily
assi
gnm
ent
was
poss
ible
.Her
e,“a
mb.
”in
dica
tes
that
ambi
guou
sL
TR
sco
uld
not
beas
sign
edto
apa
rtic
ular
fam
ily;t
hat
is,t
hey
grou
ped
phyl
ogen
etic
ally
betw
een
the
LT
R22
and
LT
R22
Bsu
bfam
ilies
(dat
ano
tsh
own)
.T
heev
olut
iona
ryag
esof
prov
irus
esin
mill
ions
ofye
ars
acco
rdin
gto
Kim
ura-
2-pa
ram
eter
corr
ecte
ddi
stan
ces
are
liste
dfo
rpr
ovir
uses
with
suffi
cien
t5�
and
3�L
TR
port
ions
.
8794 LAVIE ET AL. J. VIROL.
quenced, in total, 16 clones. The sequences were included inthe neighbor-joining analysis presented in Fig. 2. The nonhu-man sequences grouped among the human sequences; that is,they did not form separate branches or groups. Also, the per-centage of identity with the human consensus sequence wasvery similar: 82 to 91% compared to 78 to 92% for the humansequences. Therefore, our sample sequence data set does notindicate different evolutionary behavior of HERV-K(HML-5)homologous sequences in the examined nonhuman primates.However, more elaborate studies will be required to charac-terize precise evolutionary behavior of HERV-K(HML-5) ho-mologues in primates after their evolutionary separation fromthe human lineage.
DISCUSSION
HERV represent former exogenous retroviruses that arefossilized in the human genome. Closer examination of HERVsequences provides information on ancient primate-targetingretroviruses and retrovirus evolution in general. The com-pleted sequence of the human genome provides an excellentsource of information for such studies. In this paper, we set outto analyze a hitherto little-characterized HERV family,HERV-K(HML-5), in more detail. We found that only 9 out ofthe approximately 139 HML-5 loci (6%) displayed full-lengthretroviral genes flanked by LTRs on both sides. A highernumber of HML-5 loci are defective regarding proviral struc-tures; gag-prt and/or env regions are missing in about 50% ofproviruses, and the start and end points of deletions obviouslycluster within defined regions. Analogous observations wererecently made for HERV-K(HML-3), displaying deletionswithin gag and pol (26). Amplification of deleted proviruses hasbeen observed also for other HERV families, for example,deleted HERV-H elements have increased to high copy num-bers during primate evolution in comparison to full-lengthHERV-H proviruses (24). Recombination on the DNA level,mutations during retroviral reverse transcription, or splicing ofretroviral transcripts before reverse transcription and provirusformation could account for such deleted proviruses. Splicedand reintegrated transcripts have been observed for HERV(12, 21). Spliced human immunodeficiency virus type 1 tran-scripts were also found as cDNA along with full-length retro-viral RNA during infection (20), further supporting provirusformation from spliced proviral transcripts in the evolutionarypast. Besides active amplification of proviral sequences, about5% of proviral loci probably arose passively from chromosomalduplications, owing to the fact that the human genome com-prises about 5% of duplicated sequences (14). In contrast, wedo not find evidence that HML-5 sequences were amplified byL1-mediated pseudogene formation, as recently described forHERV-W (6, 37).
HML-5 loci with both gag-prt and env deletions probablyemerged during reverse transcription by recombination be-tween RNA transcripts from proviruses having either deletion.In combination with HERV-K(HML-3) deletion variants (26),we note that despite the lack of one or more proviral regions,the 5� intergenic and gag portions are usually present in obvi-ously (retrovirus-like) retrotransposed proviruses. Those ret-roviral regions are known to encode the packaging signal �that interacts on the RNA level with the Gag-encoded nucleo-
capsid (NC) protein. One may therefore hypothesize that in-teraction between retroviral RNA and NC is still essentialduring intracellular (retrovirus-like) retrotransposition ofHERV. Alternatively, helper viruses could have been involvedin the formation of new proviruses, and a packaging signalcould have interacted with the helper virus’ NC protein. Alsoin combination with previous results (9, 26), deletions withinthe env gene seem to occur recurrently, rendering the Envprotein nonfunctional. Env-demolishing mutations could haveresulted in decreasing production of infectious retrovirus. Suchdecrease could have significantly added to the fading of exog-enous stages, and env deletions may therefore represent an-other cause for the extinction of exogenous stages.
Our study shows that HML-5 sequences were fixed in anancestral genome after the simian lineage had evolutionaryseparated from the prosimian lineage but before the evolution-ary separation of Old World and New World primates, indi-cated by provirus ages of approximately 55 million years andcorroborated by PCR examination of various primate species.Other HERV-K superfamily members were fixed in an ances-tral genome approximately 30 to 40 million years ago, after theevolutionary split of Old World from New World primates (26,28, 34, 39). To the best of our knowledge, HML-5 representsthe oldest betaretrovirus in the primate genome known to date.Despite a relatively high proportion of incomplete HML-5proviruses in the human genome, our study generated a con-sensus sequence displaying the four major retroviral ORFs gag,prt, pol, and env. The corresponding proteins displayed signif-icant similarities to other betaretroviruses, and the recreatedHML-5 consensus is therefore expected to be very close insequence to the former exogenous precursor to HML-5 en-dogenous variants. Thus, this study reconstructed an ancientbetaretrovirus that was targeting primates about 55 millionyears ago.
Phylogenetic analysis of several proviral regions indicatedsimilar sequence divergence among HML-5 sequences, result-ing in almost monophyletic tree structures. This finding indi-cates that probably all HML-5 sequences were generated in arelatively brief evolutionary period around 55 million yearsago, the latter as revealed by LTR-LTR divergences and di-vergence from the consensus sequence. Formation of new pro-viruses then obviously ceased. Proviruses with deletions in gag-prt and env were probably also generated in a relatively shortperiod of time. This observation is further confirmed by PCRanalysis that revealed gag and env deletions in all tested HML-5-positive primate species. Thus, different HML-5 “masterproviruses” with different genome structures were obviouslyactive and generated provirus progeny. In this manner, HML-5displays similar behavior as HERV-K(HML-3) (26). However,both families display different behavior than HERV-K(HML-2), which formed proviruses during the hominoid period aswell as in recent human evolution (5, 7, 28, 33).
Phylogenetic analysis of LTR sequences associated with pro-viral bodies revealed several apparent LTR families. However,phylogenetic analysis of proviral body sequences yieldedmonophyletic tree topologies for most examined regions. Onlya region between reverse transcriptase and RNase H displayedbranches with higher bootstrap support. Clearly, proviral bod-ies appear much more homogeneous in sequence than theassociated LTR sequences. Thus, almost homogeneous provi-
VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8795
FIG. 3. Neighbor-joining analysis of 134 reasonably intact HERV-K(HML-5) LTR sequences. Shown here is a consensus tree from 1,000bootstraps replicates. Specific bootstrap values at major nodes are indicated. Consensus sequences, as given in Repbase (Rep) and as generatedin this study (own), were also included in the analysis. LTR22 and LTR22A sequences are clearly distinguished, with the latter comprising theso-called LTR22A2 group. Other sequences belong to the LTR22B group with some ambiguous sequences remaining (Table 2). Note that the
8796
ral bodies were associated with clearly different LTR variantsat the time of provirus formations. It is currently not knownwhether the different LTR variants were already present in theexogenous precursor(s) or represent derivatives from a germline-fixed LTR founder family. In both cases, LTR sequencesevolved in sequence independently from, and obviously more
rapidly than, the proviral bodies. Reasons for apparently dif-ferent evolutionary rates of LTRs and proviral bodies are cur-rently not clear.
Whether HML-5 is ancestral to other HERV-K familiescurrently seems unclear. Both the HML-5 and HML-6 familiesappear less related to the remaining HERV-K families, asevidenced by DNA sequence comparisons (32) and by phylo-genetic comparison of dUTPase domains (30), for instance. In
FIG. 4. Neighbor-joining analysis of HERV-K(HML-5) proviruses.The tree shown here was obtained from the analysis of a gag geneportion (region 1 in Fig. 1) and is a consensus tree from 100 bootstrapreplicates. Numbers of proviral sequences refer to Table 1. Phyloge-netic analysis included HERV-K(HML-5) homologous sequencesfrom the same gag region, obtained by PCR from various primatespecies (for species abbreviations, see the legend for Fig. 5).
FIG. 5. Presence of HERV-K(HML-5) homologous sequences invarious primate species. The schematic depicts localization of PCRprimer pairs within the proviral body 5� end and within the env generegion of the HERV-K(HML-5) consensus sequence, yielding PCRproducts of about 548 bp and about 1,635 or 300 bp. The latter numberconsiders the common presence of deleted env gene regions. PCRresults are presented in the first and second panel, respectively. Am-plification of full-length env genes in all species but the prosimiancould not be reproduced satisfactorily. A schematic of primate phy-logeny and germ line fixation of HERV-K(HML-5) versus otherHERV-K families is shown at the bottom. Species abbreviations are asfollows: for Hominoidea, Hsa is Homo sapiens, Ptr is Pan troglodytes,Ppy is Pongo pygmaeus, and Hla is Hylobates lar; for Old World pri-mates, Cgu is Colobus guereza, Msp is Mandrillus sphinx, and Mmu isMacaca mulatta; for New World primates, Ssc is Saimiri sciureus, Cjais Callithrix jacchus, and Ase is Alouatta seniculus; for prosimians, Nycis Nycticebus coucang.
LTR22 Repbase sequence groups with LTR22B sequences, opposed to LTR22own. Numbers of LTR sequences refer to the proviruses listed inTable 1. The 5� and 3� LTRs of a particular provirus located at immediately neighboring terminal nodes are indicated as 5/3. Note in this contextthat 5� and 3� LTRs of particular proviruses do not always group together, reminiscent of recently suggested HERV-involving genomicrearrangement events (13).
VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8797
addition, this study revealed that the HML-5 consensus se-quence primer binding site is identical to methionine butclearly less similar to lysine tRNA 3� ends; therefore, it isactually a HERV-M family and adds to the lesser phylogeneticrelationship with HERV-K. However, HML-5 is still morerelated to HERV-K than to other (beta)retroviruses. Also inthe course of this study, when employing an updated tRNAsequence data set (22) we noted that the previously character-ized HERV-K(HML-3) family primer binding site region (26)is more related to arginine and asparagine than to lysine tRNA3� ends, therefore actually requiring designation HERV-R orHERV-N. At the present time, the precise phylogenetic rela-tionship of HML-5 to other HERV-Ks as well as endogenousand exogenous retroviruses must await further specific inves-tigations. We are currently studying the remaining HERV-Kfamilies in a similar fashion. Certainly, results for other hith-erto little-described HERV families will significantly contrib-ute to such studies.
ACKNOWLEDGMENTS
This work was supported by grants from the Deutsche Forschungs-gemeinschaft to J.M. (Ma2298/2-1) and E.M. (Me917/16-1). P.M. issupported by a fellowship from the Knut and Alice Wallenberg Foun-dation and grants from the Swedish Research Council, Ake WibergFoundation, and Magn Bergvall Foundation. W.S. is supported byDFG grant SCH214/7-3.
REFERENCES
1. Andersson, M. L., M. Lindeskog, P. Medstrand, B. Westley, F. May, and J.Blomberg. 1999. Diversity of human endogenous retrovirus class II-like se-quences. J. Gen. Virol. 80:255–260.
2. Berkhout, B., M. Jebbink, and J. Zsiros. 1999. Identification of an activereverse transcriptase enzyme encoded by a human endogenous HERV-Kretrovirus. J. Virol. 73:2365–2375.
3. Blond, J. L., D. Lavillette, V. Cheynet, O. Bouton, G. Oriol, S. Chapel-Fernandes, B. Mandrand, F. Mallet, and F. L. Cosset. 2000. An envelopeglycoprotein of the human endogenous retrovirus HERV-W is expressed inthe human placenta and fuses cells expressing the type D mammalian ret-rovirus receptor. J. Virol. 74:3321–3329.
4. Boissinot, S., and A. V. Furano. 2001. Adaptive evolution in LINE-1 retro-transposons. Mol. Biol. Evol. 18:2186–2194.
5. Buzdin, A., S. Ustyugova, K. Khodosevich, I. Mamedov, Y. Lebedev, G.Hunsmann, and E. Sverdlov. 2003. Human-specific subfamilies of HERV-K(HML-2) long terminal repeats: three master genes were active simulta-neously during branching of hominoid lineages. Genomics 81:149–156.
6. Costas, J. 2002. Characterization of the intragenomic spread of the humanendogenous retrovirus family HERV-W. Mol. Biol. Evol. 19:526–533.
7. Costas, J. 2001. Evolutionary dynamics of the human endogenous retrovirusfamily HERV-K inferred from full-length proviral genomes. J. Mol. Evol.53:237–243.
8. Dangel, A. W., B. J. Baker, A. R. Mendoza, and C. Y. Yu. 1995. Complementcomponent C4 gene intron 9 as a phylogenetic marker for primates: longterminal repeats of the endogenous retrovirus ERV-K(C4) are a molecularclock of evolution. Immunogenetics 42:41–52.
9. de Parseval, N., V. Lazar, J. F. Casella, L. Benit, and T. Heidmann. 2003.Survey of human genes of retroviral origin: identification and transcriptomeof the genes with coding capacity for complete envelope proteins. J. Virol.77:10414–10422.
10. Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c.Department of Genetics SK-50, University of Washington, Seattle.
11. Gifford, R., and M. Tristem. 2003. The evolution, distribution and diversityof endogenous retroviruses. Virus Genes 26:291–315.
12. Goodchild, N. L., J. D. Freeman, and D. L. Mager. 1995. Spliced HERV-Hendogenous retroviral sequences in human genomic DNA: evidence foramplification via retrotransposition. Virology 206:164–173.
13. Hughes, J. F., and J. M. Coffin. 2001. Evidence for genomic rearrangementsmediated by human endogenous retroviruses during primate evolution. Nat.Genet. 29:487–489.
14. International Human Genome Sequencing Consortium. 2001. Initial se-quencing and analysis of the human genome. Nature 409:860–921.
15. Jurka, J. 2000. Repbase update: a database and an electronic journal ofrepetitive elements. Trends Genet. 16:418–420.
16. Kapitonov, V., and J. Jurka. 1996. The age of Alu subfamilies. J. Mol. Evol.42:59–65.
17. Kimura, M. 1980. A simple method for estimating evolutionary rates of basesubstitutions through comparative studies of nucleotide sequences. J. Mol.Evol. 16:111–120.
18. Kitamura, Y., T. Ayukawa, T. Ishikawa, T. Kanda, and K. Yoshiike. 1996.Human endogenous retrovirus K10 encodes a functional integrase. J. Virol.70:3302–3306.
19. Lebedev, Y. B., O. S. Belonovitch, N. V. Zybrova, P. P. Khil, S. G. Kurdyukov,T. V. Vinogradova, G. Hunsmann, and E. D. Sverdlov. 2000. Differences inHERV-K LTR insertions in orthologous loci of humans and great apes.Gene 247:265–277.
20. Liang, C., J. Hu, R. S. Russell, M. Kameoka, and M. A. Wainberg. 2004.Spliced human immunodeficiency virus type 1 RNA is reverse transcribedinto cDNA within infected cells. AIDS Res. Hum. Retrovir. 20:203–211.
21. Lindeskog, M., and J. Blomberg. 1997. Spliced human endogenous retroviralHERV-H env transcripts in T-cell leukaemia cell lines and normal leuko-cytes: alternative splicing pattern of HERV-H transcripts. J. Gen. Virol.78:2575–2585.
22. Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improveddetection of transfer RNA genes in genomic sequence. Nucleic Acids Res.25:955–964.
23. Lower, R., J. Lower, and R. Kurth. 1996. The viruses in all of us: character-istics and biological significance of human endogenous retrovirus sequences.Proc. Natl. Acad. Sci. USA 93:5177–5184.
24. Mager, D. L., and J. D. Freeman. 1995. HERV-H endogenous retroviruses:presence in the New World branch but amplification in the Old Worldprimate lineage. Virology 213:395–404.
25. Mager, D. L., and P. Medstrand. 2003. Retroviral repeat sequences. In D.Cooper (ed.), Nature encyclopedia of the human genome. Macmillan Pub-lishers Ltd., Hampshire, England.
26. Mayer, J., and E. Meese. 2002. The human endogenous retrovirus familyHERV-K(HML-3). Genomics 80:331–343.
27. Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1997. Chromosomal assign-ment of human endogenous retrovirus K (HERV-K) env open readingframes. Cytogenet. Cell Genet. 79:157–161.
28. Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1998. Human endogenousretrovirus K homologous sequences and their coding capacity in Old Worldprimates. J. Virol. 72:1870–1875.
29. Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1997. Multiple human en-dogenous retrovirus (HERV-K) loci with gag open reading frames in thehuman genome. Cytogenet. Cell Genet. 78:1–5.
30. Mayer, J., and E. U. Meese. 2003. Presence of dUTPase in the varioushuman endogenous retrovirus K (HERV-K) families. J. Mol. Evol. 57:642–649.
31. Mayer, J., M. Sauter, A. Racz, D. Scherer, N. Mueller-Lantzsch, and E.Meese. 1999. An almost-intact human endogenous retrovirus K on humanchromosome 7. Nat. Genet. 21:257–258.
32. Medstrand, P., and J. Blomberg. 1993. Characterization of novel reversetranscriptase encoding human endogenous retroviral sequences similar totype A and type B retroviruses: differential transcription in normal humantissues. J. Virol. 67:6778–6787.
33. Medstrand, P., and D. L. Mager. 1998. Human-specific integrations of theHERV-K endogenous retrovirus family. J. Virol. 72:9782–9787.
34. Medstrand, P., D. L. Mager, H. Yin, U. Dietrich, and J. Blomberg. 1997.Structure and genomic organization of a novel human endogenous retrovirusfamily: HERV-K (HML-6). J. Gen. Virol. 78:1731–1744.
35. Morgenstern, B. 1999. DIALIGN 2: improvement of the segment-to-seg-ment approach to multiple sequence alignment. Bioinformatics 15:211–218.
36. Mueller-Lantzsch, N., M. Sauter, A. Weiskircher, K. Kramer, B. Best, M.Buck, and F. Grasser. 1993. Human endogenous retroviral element K10(HERV-K10) encodes a full-length gag homologous 73-kDa protein and afunctional protease. AIDS Res. Hum. Retrovir. 9:343–350.
37. Pavlicek, A., J. Paces, D. Elleder, and J. Hejnar. 2002. Processed pseudo-genes of human endogenous retroviruses generated by LINEs: their integra-tion, stability, and distribution. Genome Res. 12:391–399.
38. Reus, K., J. Mayer, M. Sauter, H. Zischler, N. Muller-Lantzsch, and E.Meese. 2001. HERV-K(OLD): ancestor sequences of the human endoge-nous retrovirus family HERV-K(HML-2). J. Virol. 75:8917–8926.
39. Seifarth, W., C. Baust, A. Murr, H. Skladny, F. Krieg-Schneider, J. Blusch,T. Werner, R. Hehlmann, and C. Leib-Mosch. 1998. Proviral structure,chromosomal location, and expression of HERV-K-T47D, a novel humanendogenous retrovirus derived from T47D particles. J. Virol. 72:8384–8391.
40. Smit, A. F. A. 1996. Structure and evolution of mammalian interspersedrepeats. Ph.D. dissertation. University of Southern California, Los Angeles.
41. Sverdlov, E. D. 2000. Retroviruses and primate evolution. Bioessays 22:161–171.
42. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W:improving the sensitivity of progressive multiple sequence alignment throughsequence weighting, position-specific gap penalties and weight matrix choice.Nucleic Acids Res. 22:4673–4680.
43. Tristem, M. 2000. Identification and characterization of novel human en-dogenous retrovirus families by phylogenetic screening of the human ge-nome mapping project database. J. Virol. 74:3715–3730.
8798 LAVIE ET AL. J. VIROL.