The Involucrin Gene of Old-World Monkeys and Other Higher Primates

16
The Involucrin Gene of Old-World Monkeys and Other Higher Primates: Synapomorphies and Parallelisms Resulting from the Same Gene-altering Mechanism’ Philippe Djian and Howard Green Department of Cellular and Molecular Physiology, Harvard Medical School The involucrin gene of platyrrhines and hominoids contains a segment of lo-codon repeats which were added vectorially at the same site in the coding region. We have now cloned and sequenced the involucrin gene of four cercopithecoid monkeys- two macaques (mulatta andfascicularis) and two Cercopithecus monkeys (aethiops and hamlyni). Each gene contains a similar segment of short repeats; some of these were added in a common anthropoid lineage, others were added in a common catarrhine lineage, and still others were added in a common macaque or Cercopi- thecus lineage. Repeats added before a lineage diverges become synapomorphies in the sister taxa resulting from the divergence. Repeats added independently in different diverged lineages become parallelisms. The synapomorphies are the result of the action of a targeted duplication mechanism acting in a common ancestral lineage, but the parallelisms are the result of the same duplication mechanism transmitted to successively divergent sublineages and acting independently in each. Introduction Involucrin is a protein of the epidermis and other stratified squamous epithelia. It is one of the substrates of epidermal transglutaminase, an enzyme that catalyzes the formation of a cross-linked envelope beneath the plasma membrane of the keratinocyte (Rice and Green 1977, 1979). The nucleotide sequence of the anthropoid involucrin gene is known for three New-World monkeys (Tseng and Green 1989; Phillips et al. 199 1) and five hominoids (Eckert and Green 1986; Djian and Green 1989~2, 1989b, 1990; Teumer and Green 1989). In each of these anthropoid species, the coding region of the involucrin gene contains a segment of short repeats. Although the involucrin gene of nonprimate mammals (Tseng and Green 1990), of prosimians (Tseng and Green 1988; Phillips et al. 1990), and of tarsioids (Djian and Green 199 1) also contains a segment of repeats, the anthropoid segment of repeats differs in location, sequence, and repeat length. The repeats in all anthropoids are similar, but there are enough differences to discriminate between matching and nonmatching repeats in different species and to identify duplication patterns within a lineage. We now report the nucleotide sequence of the involucrin genes of two macaques, Macaca fascicularis and M. mulatta, and two guenons, Cercopithecus aethiops and C. hamlyni. Comparison of the repeats of these cercopithecines with those of hominoids and platyrrhines shows that the involucrin gene of all lineages of the higher primates 1. Key words: involucrin, Old-World monkeys. Address for correspondence and reprints: Howard Green, Department of Cellular and Molecular Phys- iology, Harvard Medical School, 25 Shattuck Street, Boston, Massachusetts 02 115. Mol. Biol. Evol. 9(3):417-432. 1992. 0 1992 by The University of Chicago. All rights reserved. 0737-4038/92/0903-0004$02.00 Downloaded from https://academic.oup.com/mbe/article/9/3/417/1037258 by guest on 08 December 2021

Transcript of The Involucrin Gene of Old-World Monkeys and Other Higher Primates

Page 1: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

The Involucrin Gene of Old-World Monkeys and Other Higher Primates: Synapomorphies and Parallelisms Resulting from the Same Gene-altering Mechanism’

Philippe Djian and Howard Green Department of Cellular and Molecular Physiology, Harvard Medical School

The involucrin gene of platyrrhines and hominoids contains a segment of lo-codon repeats which were added vectorially at the same site in the coding region. We have now cloned and sequenced the involucrin gene of four cercopithecoid monkeys- two macaques (mulatta andfascicularis) and two Cercopithecus monkeys (aethiops and hamlyni). Each gene contains a similar segment of short repeats; some of these were added in a common anthropoid lineage, others were added in a common catarrhine lineage, and still others were added in a common macaque or Cercopi- thecus lineage. Repeats added before a lineage diverges become synapomorphies in the sister taxa resulting from the divergence. Repeats added independently in different diverged lineages become parallelisms. The synapomorphies are the result of the action of a targeted duplication mechanism acting in a common ancestral lineage, but the parallelisms are the result of the same duplication mechanism transmitted to successively divergent sublineages and acting independently in each.

Introduction

Involucrin is a protein of the epidermis and other stratified squamous epithelia. It is one of the substrates of epidermal transglutaminase, an enzyme that catalyzes the formation of a cross-linked envelope beneath the plasma membrane of the keratinocyte (Rice and Green 1977, 1979).

The nucleotide sequence of the anthropoid involucrin gene is known for three New-World monkeys (Tseng and Green 1989; Phillips et al. 199 1) and five hominoids (Eckert and Green 1986; Djian and Green 1989~2, 1989b, 1990; Teumer and Green 1989). In each of these anthropoid species, the coding region of the involucrin gene contains a segment of short repeats. Although the involucrin gene of nonprimate mammals (Tseng and Green 1990), of prosimians (Tseng and Green 1988; Phillips et al. 1990), and of tarsioids (Djian and Green 199 1) also contains a segment of repeats, the anthropoid segment of repeats differs in location, sequence, and repeat length. The repeats in all anthropoids are similar, but there are enough differences to discriminate between matching and nonmatching repeats in different species and to identify duplication patterns within a lineage.

We now report the nucleotide sequence of the involucrin genes of two macaques, Macaca fascicularis and M. mulatta, and two guenons, Cercopithecus aethiops and C. hamlyni. Comparison of the repeats of these cercopithecines with those of hominoids and platyrrhines shows that the involucrin gene of all lineages of the higher primates

1. Key words: involucrin, Old-World monkeys.

Address for correspondence and reprints: Howard Green, Department of Cellular and Molecular Phys- iology, Harvard Medical School, 25 Shattuck Street, Boston, Massachusetts 02 115.

Mol. Biol. Evol. 9(3):417-432. 1992. 0 1992 by The University of Chicago. All rights reserved. 0737-4038/92/0903-0004$02.00

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 2: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

4 18 Djian and Green

has been extended by the same mechanism of vectorial repeat addition. This mech- anism gives rise to both synapomorphies and parallelisms.

Methods

Esophageal keratinocytes of Macaca fascicularis were obtained through the cour- tesy of Dr. R. H. Rice (University of California at Davis). Epidermal keratinocytes were obtained from skin biopsies of Cercopithecus aethiops (#383-86) and of M. mu- Zatta, performed by Dr. David Parritz (New England Primate Center, Harvard Uni- versity ) . Keratinocytes of C. hamlyni ( #BK6-588 IO4 ) were derived from a skin biopsy performed at the San Diego Zoo, through the courtesy of Drs. Donald Jenssen and Meg Smith.

Keratinocytes were grown on a layer of irradiated 3T3 cells (Rheinwald and Green 1977; Simon and Green 1985). Because keratinocytes of cercopithecoids mul- tiplied slowly, contaminating fibroblasts grew appreciably, and DNA was prepared from the mixed cultures. Genomic libraries were made from size-selected fragments according to a method described elsewhere (Tseng and Green 1988; Simon et al. 1989), except that the plasmid used was pUCl8. To obtain the first clone of the involucrin gene of M. fascicularis, we probed a genomic library of Hind111 fragments with a 5.5-kb XbaI-EcoRI fragment containing the whole involucrin gene of Pongo pygmaeus (Djian and Green 1989a) and obtained a clone with an insert of 3.3 kb and containing the entire coding region. We then made a second library, of PstI- BamHI fragments, and obtained clones containing a 1.9-kb insert consisting of the coding region less 26 1 nucleotides at the 5’ end (fig. 1). To obtain involucrin clones of the other monkeys, the 1.9-kb PstI-BamHI fragment of M. fascicularis was used as a probe. From a library of BamHI fragments of M. mulatta we obtained clones containing a 2.4-kb fragment including the entire coding region. From similar libraries of C. hamlyni and C. aethiops, clones containing a 2.2-kb BamHI fragment were obtained.

After progressive digestion with nuclease BAL3 1 (Poncz et al. 1982), the resulting overlapping fragments were subcloned into M 13 (Messing and Vieira 1982 ). For each species, the nucleotide sequence of two independent clones was determined, first on one strand of clone 1 and then on the opposite strand of clone 2 (fig. 1). The alignment of repeats and the cladistic analysis were performed by eye.

Results Restriction Maps

Most restriction sites located outside the segment of repeats of the cercopithecines are also present in the hominoids (Djian and Green 1990); the exceptions are (a) a Hind111 site present 5 ’ of the coding region in the four cercopithecines but absent from the other anthropoids and (b) another Hind111 site in the coding region 5’ of the segment of repeats in Cercopithecus hamlyni alone. The two clones sequenced for each species did not differ in size, except for Macaca mulatta, whose two clones differed in the size of the NdeI-XbaI fragment. Clone 2, which gave rise to the smaller fragment, lacked one of the 33 repeats of clone 1 (fig. 1) and diverged from it at two nucleotide positions. To confirm the polymorphism of the involucrin gene in M. mulatta, we digested genomic DNA from the same animal with N&I and XbaI, electrophoresed the products through a 4% agarose gel (NuSieve; FMC), and examined them by Southern blotting. In addition to larger fragments, we observed two fragments of

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 3: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

Clone

C. hamlyni Clone

Clone

C. aethiops Clone

Clone 1

M. fasciculatia

Clone 2

Clone 1 M. mulatta

Clone 2

The Involucrin Gene of Higher Primates 419

N&I X&I

FIG. I.-Restriction map of involucrin gene. The coding regions of the four cercopithecine genes are shown as boxes with the segments of repeats stippled. Eight restriction sites found in all four cercopithecine genes are placed in the center of the figure, along a horizontal line. These sites are located outside the segment of repeats. The Hind111 site 5’ of the coding region is the only site restricted to the cercopithecines. The three framed sites are found in the prosimians as well as in the anthropoids. The circled BumHI site is present in the platyrrhines as well as in the catarrhines. The three underlined sites are found in hominoids, but it is not known whether they are also present in platyrrhines or prosimians. Distances separating restriction sites are given in base pairs. The distance between the framed PstI and HaeIII sites varies in the different species, depending on the size of the segment of repeats. At the top of the figure are shown Hind111 and PstI sites confined to the guenons. At the bottom ofthe figure are shown NdeI and XbaI sites confined to the macaques; the distance between the two sites differs in Macacu fuscicularis and in M. mulutta because the numbers of repeats in the involucrin genes of these two species is different. In M. mulutta, the two alleles sequenced differ by one repeat (30 bp); its location between the NdeI site and the XbaI site is indicated by a vertical bar through the segment of repeats. Arrows represent sequenced parts of overlapping M 13 subclones. For each species, two independent genomic clones were sequenced.

-6 15 and -645 bp, a pattern expected if one of two alleles lacked a single repeat between the NdeI and the XbaI sites indicated on figure 1.

General Features of the Coding Region

The coding region of the involucrin gene of the four cercopithecines is shown in figure 2. As in other anthropoids, it contains a segment of repeats at site M. The

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 4: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

Mm” G T C AG T (30) Mfa G T C AG T 130) Ca.2 G T C AG T (30) Cha G T A AG C (301 Plat A C C GC T

Horn CAG GAG CAA ATG AAA CAG CCA ACT CCA CTG CCT CCC CCA TGC CAG AAG GTG CCT GTC GAG CTC CCA GTG GAG GTC CCA TCA AAG CAA GAG Mmu G T G A G G T G T AT A (60) Mfa A T G A G T T G T AT A (60) Cae G T G A A G T G T AT A (60) Cha G T G A A A T A T AT A (601 Plat G G G G G G C G G G C T

Horn GAA AAG CAC ATG ACT GCT GTA AAG GGA CTG CCT GAG CAA GAA TGT GAG CAA CAG CAG CAG GAG CCA CAG GAG CAG GAG CTG CAG CAA CAG Mmu G

G. ;f AT G G G GAG A G (90)

Mfa AT G G G GAG A G (90) Ca.2 G GT AT G G R GAG A G (90) Cha G GG AT G G G GAG A G (90) Plat A AT AT G G A ___ G A

Horn CAC TGG GAA CAG CAT GAG GAA CAT CAG AAA GCA GAA AAC CCA GAG CAG CAG CTT AAG CAG GAG AAA GCA CAA AGG GAT --- CAG CAG CTA Mmu A C G G A G A T ___ (119) Mfa A C G G A G A T ___ (119) CW A C G A G A T ___ (119) Cha A C G A G G T ___ (119) Plat G G A G T A G G AAG

Horn AAC AAA CAG CTG GAA GAA GAG AAG AAG CTC TTA GAC CAG CAA CTG GAT CAA GAG CTA GTC AAG AGA GAT GAG CAA CTG GGA ATG AAG AAA Mmu AGGA CTA A C T S TG (149) Mfa AGGA .CTA A C T G TG (149) Cae A G GA A T G CA (149) Cha A G GA T T G GA (149) Plat C G GG CTG A C c c G CA

Horn GAG CRA CTG TTG Mmu AC

fM G CAG A c c (496)

CAG A c c (570) (439) (429)

Horn CCA GCC CTG CCC ACA AAG GGA GAA GTA TTG CTT CCT GTA GAG CAC CAG CAG CAG --- RAG CAG GAG GTG CAG TGG CCA ccc A?Q, CAT AAA m Mmu c A A T G CAG T A CA CTC A A ZCAA (526) Mfa C A A T G CAG T G CA CTC A A 2.U (600) cae c A A T G G CAG CA CTC A A XAA (469) Cha C A A T G G CAG CA CTC A A Ub (459) Plat A C G G G G --- T G -- --- - GT.&

FIG. 2.-Coding region. The consensus sequence of five hominoids is derived from Eckert and Green ( 1986), Djian and Green ( 1989a, 19896, 1990), and Teumer and Green ( 1989). Nucleotides differing from those of this sequence are shown for the sequences of the four cercopithecines and for the consensus sequence of three platyrrhines (Phillips et al. I99 1). Horn = hominoids; Mmu = Macaca mulatta; Mfa = M. fascicularis; Cae = Cercopithecus aethiops; Cha = C. hamlyni; and Plat = platyrrhines. Boxed nucleotides are shared exclusively by the four cercopithecines, the two macaques, or the two guenons. Numbers in parentheses are codon numbers. For M. mulutta, codon numbers 3’ of site M are those of the larger allele. The two clones of C. aethiops diverge in the third position of codon 80; those of 44. mulutta diverge at the first position of codon 142, For f,,,+j, M fnF,-;,-,,/nr;q 2nd (’ hnm/,,,,i the ~p,,,wnrd nnrtr nf tw,, rl,,n,w ,VP irlontirol 9ite h" Ifromnrl\ mntoin. the ro..no+r. . ..hnrn t..r,. -..nln-*:A-. --- A.-...-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 5: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

The Involucrin Gene of Higher Primates 42 I

sequence of the coding region flanking this site in the four cercopithecines can be aligned with the corresponding sequences in the hominoids and platyrrhines. Six nu- cleotides and a one-codon insertion are synapomorphic for the cercopithecines (or cercopithecoids) . Two nucleotides are synapomorphic for the macaques, and five are so for the guenons.

The Segment of Repeats

As in other anthropoid primates, the segment of repeats is composed of multiple duplicates of a lo-codon sequence belonging to two main types, A and B. The first three codons of A and B repeats are AAG CAC CTG and GAG CTC CCA, respectively; the remaining seven codons are identical in the two types. Repeats may be further distinguished by the presence of marker nucleotides deviating from the consensus sequence of either A repeats or B repeats. A small number of repeats lacking the first three codons cannot be classified as either A or B; they have been designated “X” repeats (Djian and Green 1990).

The nucleotide sequence of the segment of repeats in the four species is shown in figure 3. Repeats on the same line are considered orthologous. A summary of the alignment of cercopithecine repeats with those of Pongo pygmaeus is shown in fig- ure 4.

The Early Region

All anthropoid species examined to date (five hominoids and three platyrrhines) possess 10 repeats (here numbered l- 10) constituting the early region. The four cer- copithecines are distinctive because all lack repeat 1; accordingly, this repeat must have been deleted in a common ancestral lineage; but, as we do not know whether repeat 1 is also lacking in the colobines, we cannot specify whether it was deleted in the common cercopithecoid lineage or later in the common cercopithecine lineage. Both guenons lack repeats 8 and 9 and share two shorter deletions, one consisting of three codons in repeat 2 and the other consisting of a single codon in repeat 10. The early region of the guenons is the most heavily deleted early region yet found. None of the cercopithecines possesses in its early region any repeats not present in the pla- tyrrhines and hominoids.

The Middle Region

In the hominoids, the middle region is composed of 17-25 repeats. Of these repeats, 17 are shared by all of five hominoids, while seven are shared by different hominoid sublineages. In the platyrrhines, there are 15 repeats in the middle region, but none match repeats of the hominoid middle region (Phillips et al. 199 1).

A. Repeats Shared by Cercopithecines and Hominoids

All four cercopithecines possess substantial middle regions containing 16- 18 shared repeats designated by Latin letters (figs. 3 and 4). As seven of these repeats correspond to repeats present in the hominoid middle region, they are also designated by the Greek letters used earlier to identify them in the hominoids (Djian and Green 1990). Each repeat shared by the cercopithecines and hominoids (a/a, b/P, g/y, h/ 6, w/u, x/4, and y/x) is of the same type (A, B, or X) in the two taxons. This orthology is further supported both by seven coincident marker nucleotides in five of the repeats and by a common three-codon deletion in repeat g/y (fig. 5 ). These seven repeats are considered synapomorphic for the catarrhines. As both guenons lack repeats

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 6: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

422 Djian and Green

Ed. mulatta M. fascicularis B GAG CTC CCA GAG CAG CAG CAG GGG CAC CTG 34 B GAG CTC CCA GAG CAG CAG CAG GGG CAC CTG 42 ., A AAA CAC CTG GAG CAG CAG GAG GGG Cm CTG 33 B GAG CTC CCA GAG CAG CAG m CAG CTG 32

A AA1 CAC CTG GAG CAG CAG GAG GGG CAA CTG 41 .i B GAG CTC CCA GAG CAG CAG GAG GGA CAG CTG 40 ab

-I*A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 31 jA AAG CAT CTG GM CAC CAG GAG GGG CAG CTG 30

A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 39 ag A AAG CAT CTG GAA CAT CAG GAG GGG CAG CTG 38 af

1 [B GAG CTC CCA GAG CAG CAG GAG GGA CAG CTG 29 1 A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 28

B GAG CTC CCA GAG CAG CAG GAG GGA CAG CTG 37 y/X

CAC CAG GAG GTG CAG CTG 27 A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 36 =/$I A AAG CAT CTG GAIL CAC CAG GAG GTG CAG CTG 35 x/U

B GAC CTC CCA CU CAG m GGG CAG TTG 26 B GAC CTC CCA GAG CA1 CAG GTG GGG CAG TTG 34 A AAG CAT ATG GAG CAG CAG GZG GAG CAG CTG 25 A AAG CA%! ATG GAG CAG CAG GTG GAG CAG CTG 33 : A AAG CAC CTG GAG CAG CAG GAG GAG CAG CTG 24 A AAG CAC CTG GAG CAG CAG GAG GAG CAG CTG 32't

GAG CAT CTG GAG CAG CAG AAG GGG CAG CTG 23 A GAG CAT CTG GAG CAG CAG &AG GGG CAG CTG 31 s EAG CAT CTG GAG CAG CAG AAG GGG CAG CTG 22 A GAG CAT CTG GAG CAG CAG AAG GGG CAG CTG 30 r‘ --_ --_ --- --- --- ___ _-- --- --- --- --- --- --- --_ ___ --- --- --- --- -__

A CAG CAC CTG GAG CAG CAG GAG GGG CAG GTG 21 A ORG CAC CTG GAG CAG CAG GAG GGG CAG OTG 29 ,p A OAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 20 A GAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 28 '0

A AAG CAC CTA GAG GAG GAG AAG GGG CAG TTG 27 A AAG CAT CTG GAG CAG CAG GAG GGG CAA CTG 26

-III' B GAG CTC CCA GAG CAG CAG GTG GGG CAG CTG 25 A AAG CAC CTG GAG CAG CAG GAG GAG CAG CTG 24 A GAG CAT CTG GAG CAG CAG MG GGG CAG CTG 23

--- --- --- GAG CAG CAA GAG GGG CAG CTG 22 A QAG CRC CTG GAG CAG CAG GAG GGG CAG CTG 21

A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 19 A AAG CAC CTA GAG GAG GAG AAG GGG CAG TTG 19 ' A AAG CAC CT1 GAG GAG GAG AAG GGG CAG TTG 18' II A AAG CAT CTG GAG CAG CAG GAG GGG CAA CTG 18 A AAG CAT CTG GAG CAG CAG GAG GGG CA1 CTG 17 1 B GAG CTC CCA GAG CAG CAG GTG GGG CAG CTG 17 B GAG CTC CCA GAG CAG CAA GTG GGG CAG CTG 16 1

I I I

A AAG CAC CTG GAG CAG CAG GAG W CAG CTG 16 A AAG CAC CTG GAG CAG CAG GAG G&G CAG CTG 15 i A GAG CAT CTG GAG CAG CAG AAG GGG CAG CTG 15 NX --- --- --- GAG CAG CAG AAG GGG CAG CTG 14 h/6 X --- --- --- GAG CAG CA& GAG GGG CAG CTG 14 X --- --- --_ GAG CAG CAA GAG GGG CAG CTG 13 gr( A GAG CAC CTG GAG CAG CAG GAG GGG CAG GTG 13 --- -__ ___ _-- --- --- --- --- --_ ___ f

III

A GAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 12 A AAG CAC CT1 GAG GAG CAG AAG GGG CAG TTG 11

A GAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 12 b//j A AAG CAC CT1 GAG GAG CAG AAG GGG CAG 'l!TG 11 a/a

A AAG CAT CTG GAG CAG CAG GAG GGG CAA CTG 10 A AAG CAT CTG GAG CAG CAG GAG GGG CAA CTG 10 B GAG CTC CCA GAG CAG CAG GTG GGG CAG CCA 9 B GAG CTC CCA GAG CAG CAG GTG GGG CAG CC& 9 A AAG CAC CTG GAG CAG MG GAG AAG CAG CTG 8 A AAG CAC CTG GAG CAG AAG GAG AE CAG CTG 8 B GAG CTC CCA GAG CAG #A GAG GGG CAG CTG 7 B GAG CTC CCA GAG CAG AAA GAG GGG CAG CTG 7 A AAG CRC CTG GAG AAG CAG GAG GCA CAG CTG 6 A AAG CAC CTG GAG AAG CAG GAG GCA CAG CTG 6 B GAG CTC CCA GAG CAG CA1 GTG w CAG CCA 5 B GAG CTC CCA GAG CAG CAG GTG GCA CAG Cc1 5 A AAG CAC CTG GAG CAA CAG GA& AAG CAZ CT1 4 A AAG CAC CTG GAG CAA CAG GU A&G CAT CTA 4 B GAG CAC CCA GAG CAG CAG GAG GGA CPA CT1 3 B GAG CAC CCA GAG CAG CAG GAG GGA CAA CTA 3 A AAA CAT CTG GAG CAG CAG GAG GGG CAG CTG 2 A AAA CA2 CTG GAG CAG CAG GAG GGG CAG CTG 2

--- --- --- --- --- --- --- --- --- --_ , --- --- --- --- --- --- --- --- --- --_ 1

FIG. 3.-Segment of repeats in four cercopithecines. The repeats on the same line are orthologous. All repeats are numbered from 3’to 5’, and their type (A, B, or X) is given. As the cercopithecines (or cercopithe- coids) are the only anthropoids that have deleted entire repeats in the early region, we have retained a number for these deleted repeats. Repeats of the middle region shared by two to four cercopithecines are designated by Latin letters. Seven of these repeats shared by the hominoids are also designated by Greek letters. Nonconsensus (marker) nucleotides are indicated by boldface characters. Deleted codons or repeats are indicated by dashes. Duplicated blocks of repeats are framed and linked by a bracket; the Roman numeral bearing a prime designates the more recently generated block. The asterisk indicates uncertainty about the relative age of two duplicate blocks. The sequence presented for Macaca mulutta is that of clone 1 (fig. I ); clone 2 differs in that it lacks repeat 13 ( f) and possesses an A, instead of a G, in the third position of the sixth codon of repeat 5. Marker nucleotides shared exclusively by the four cercopithecines are circled in the sequence of Cercopithecus hamlyni. Marker nucleotides shared exclusively either by the guenons or by the macaques are underlined in the sequence of C. hamlyni and M. mulatta, respectively.

x/4 and y/x, these must have been deleted in an ancestor of the two species. The corresponding repeats are also deleted from the human gene but are not deleted from the gene of other hominoids (Djian and Green 1989b, 1990).

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 7: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

The lnvolucrin Gene of Higher Primates 423

C. aethiops C. hamlyni B GAG CTC CCA GAG CAG CAG CAG GGG CAC CTG 31 B GAG CTC CCA GAG CAG CAG@G GGG CAC CTG 30 aj

--- --- --- --- __- --- --- --- ___ ___ --_ ___ _-_ -_- -_- --- --- --_ --- __-

B GAG CTC CCA GAG CAG CAG GTG GGG CAG CTG 20 B GAG CTC CCA GAG CAG CAG GTG GGG CAG CTG 20 1 A AAG CAC GTG GAG CAG CAG GTG GGA CAG CTG 19 A AAG CAC OTG GAG CAG CAG GTG GGG CAG CTG 19 k A AAG CAC QTG GAG CAG CAG GAG GAG 'ZAG CTG 18 A AAG CAC QTG GAG CAG CAG GAG GAG CAG CTG 18 j

I,A GAG CAT CTG GAG CAG CAG WIG GGG CAG CTG 17 A GAG CAT CTG GAG CAG CAGQ\G GGG CAG CTG 17 h/8

IX --- --- --_ GAG CAG CAG AAG GGG CAG TTG 16 X --- --- --- GAG CAG CAG WAG GGG CAG XTG 16 q/y

A GAG CAC CTG GA1 GAG CAG GAG GAG CAG CCA 15 A GAG CAC CTG GA1 GAG CAG GAG G&G CAG CCL 151 . A AAG CAC TTG GAG CAG CAG GAG GGG CTG CTG 14 A AAG CAC TTG GAG CAG CAG GAG GGG CTG CTG 14 d A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 13 A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 13 a A GAG CAC CTG GAG CAG CAG GAG GAG CAG CTG 12 A GAG CAC CTG GAG AAG CAG CAG G& CAC CTG 12 b/P A AAG CAC TTA GAG GAG CAG AAG GGG CAG TTG 11 A AAG CAC ZTA GAG GAG CAG@tG GGG CAG I@G 11 r/a

A AAG CAT CTG GAG --- CAG GAG GGG CAA CTG 10 A AAG CAT CTG GAG --- CAG GAG GGG C@CTG 10 --- --- --- _-- ___ --- --- --- --- --- g ___ ___ -_- --- --- --- -_- --- --- _-- g --- --- -_- ___ _-- --- --- --- --- -__ * ___ --- --- --- --- -__ -__ -__ --_ --_ *

B GAG CTC CC0 GAG CAG Cm GAG GGG CAG CTA 7 B A AAG CAC CTG GAG AAG CAG GAG GCA CAG CTG 6 A B GAG CTC CCA GAG CAG CAG G'IG GGA CAG CC& 5 B A AAG CAC CTG GAG CAA CAG GAli A&G CAG CTA 4 A B GAG CAC CCA GAG CAG CAG GAG GGA CAA CTA 3 B A AAA CAT CTG --- --- --- GAG GGG CAG CTG 2 A

___ ___ -_- --- --- --- --- --- ___ ___ 1

GAG AAA GAG AAG GAG AAA ---

CTC CAC CTC CAC CAC CAT ---

CCA CTG CCA CTG CCA CTG --_

GAG GAG GAG GAG GAG --- _--

CAG &AG CAG CA1

CAG __- --_

CAA CAG CAG CAG CAG _-_

GAG GAG GTG

GGG GCA GGA

CAG CAG CAG

CTG 7 CTG 6 CCA 5

GPA A&G CAG CT1 4 GAG GGA CAA CT1 3 GAG GGG CAG CTG 2

___ --_ ,

FIG. 3. (Continued)

B. Repeats Present in Cercopithecines But Not Present in the Hominoids

After its separation from the cercopithecines, the common hominoid lineage generated 10 additional synapomorphic repeats (C-K, v, o, x, cr, and ‘t). Except for repeat C,, which has been deleted in Pongo (fig. 4)) these repeats are found in Homo, Pan, Gorilla, Pongo, and Hylobates (Djian and Green 1990). In contrast, there is only one repeat (1) that is shared by the four cercopithecines and not by the hominoids. This repeat could have been added in either the cercopithecine lineage or the cercopi- thecoid lineage. After their divergence, the hominoids retained a common lineage long enough to add 10 repeats, whereas the cercopithecoid lineage diverged into mul- tiple sublineages after addition of only one repeat. During this period of relative in- activity in repeat generation, there occurred 11 nucleotide substitutions synapomorphic for the cercopithecines or cercopithecoids; six were in the parts of the coding region flanking the segment of repeats, as mentioned above, one was in the early region of

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 8: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

P. pygmEeo6 M. molatta M. f.wci&e C. aethiopa C. hamlyni

t: 34 33 42 41 -aj- -d- 31 30

162 90

32 40 -ah[y*l- 29 29 26 -ag[x*l- A A

25 24 23 22 21 20 19 16 17 16 15 14 13

ae- 21 [j' lad- t 26

26 25 1

L - 22 22 29 37 -Y- A A 26 36 -x- A A 21 35 -"- 21 21 26 34 --v

-6[h'*] -r[h']

a'IG -o[b']

k- j- 12 I 12 1141 -b- 11 11 plJ1 -a-

10 10 10 10 10 9 9 9 A A 6 8 a A A 7 7 1 1 7 6 6 6 6 6 5 5 5 5 5 4 4 4 4 3 3 3 3 3 2 2 2 2 2

FIG. 4.--Summary of alignment of repeats of cercopithecines with those of Pongo pygmaeus. In the middle region of the cercopithecines a Latin letter preceded by a dash denotes a repeat added in the common macaque lineage. A letter followed by a dash denotes a repeat added in the common guenon lineage. A letter between two dashes denotes a repeat shared by the four cercopithecines. The origin of a duplicated repeat, when known, is given in brackets (e.g., [a’] is a duplicate of a, and [a”] is a duplicate of a’). Similarly, blocks II’, III’, and IV’ are duplicates of blocks II, III, and IV, respectively. Block I of the e/m extension (repeats af, ag, and ah ) is a duplicate of block I of the middle region (repeats w/u, x/ 4, y/x). An asterisk indicates uncertainty as to which of two duplicates is older. The thick vertical bar separates those repeats of Pongo pygmaeus that are not orthologous to any in the cercopithecines.

424

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 9: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

cae ___ ___ ___ _-_ ___ ___ --- ___ __- ---

Cha ___ ___ --- --_ ___ ___ ___ ___ ___ __-

ma PPa

PPY “la Mm” Mfa cae Cha

AAG AAG ARG ARG AAG

AAG AAG

W/U CAG CAG CAG CAG GAG CAG

CAG CAG

CTG 28 CTG 29 CTG 29 CTC 26 CTG 25 CTG 27 CTG 35 CTG 21 CTG 21

;tg g& 8 h/6 CAG CAG GAG GGG CAG CTG 14 CAG CAG GAG GGG CAG CTG 14 CAG CAG GAG GGG CRG CTG 14

CTG CTG %.I CAG CAG GAG GGC CAG CTG 14

CAC CAG GAG AGG CRG CTG 14 CTG GAG CAG CAG 1 a G GGG CAG CTG 15 --- G*G CAG CAG G lxx CRG CTG 14 CTG GAG CAG CAG G GGG CAG CTG 17 CTG GAG CAG CAG G GGG CAG CTG 17

,J___ --- ---IGAG CAG CAGlt+.G 2 CAGmcG 16 Mfa --- --- --- GAG CRG C.+.,GAG GGG CAG CTG 13

Chat--- --- --- GAG CRG C*G,&C GGG C*G,,$G 16

b@ ifsa t CAC CTG GAG CAG CAG GGG CAG CTG 12 PPa G CAC CTG GAG CI\G CAG GGG CAG CTG 12 GW G CAC CTG GAG CRG CAG

f

GGG CAG CTG 12

iz G CRC CTT GAG GAG CAG GGG CAG CTG 12 G CAC CTG GAG CAC CAG GAG CAG CTG 12

MT," G CAC CTG GAG CAG CAG GAG GGG CAG CTG 12 Mfa G CAC CTG GAG CAG CRG GAG GGG CAG CTG 12 cae G CAC CTG GAG CAG CM GAG CAG CTG 12 Cha G C*C CTG GAG U.G CAG 'ZAG CAC CTG 12

ala ma AAG CAC GAG G CAG GAG GGA CAG CTG 11 PPa AAG CAC GAG G CAG GAG GGG C CTG 11 Ggo RAG CAC GAG G CAG GRG GGG C til CTG 11 PPY AAG CAC GAG G CAG G GGG CAG CTG 11 H1.s AAG CAC GAG G CAG G GGG CAG CTG 11 Mm" AAG CAC GAG c CRG G CGG CAG G 11 Mfa AAG CAC GAG G CAG t GGG CAG G 11 cae AAG CRC GAG G CAG G GGG CAG G 11 ala AAG CAC GAG t CAG G GGG CAG G 11

FIG. 5.-Repeats of middle region shared by cercopithecines and hominoids. Seven orthologous repeats are designated by both Latin and Greek letters. Individual repeats are numbered for each species. Marker nucleotides are indicated by boldface characters, and those shared by at least two species are framed. If Pan and Gorilla are sister species, as concluded from a previous study of this gene (Djian and Green 19896), there is no (fortuitous) sharing of marker nucleotides inconsistent with the phylogenetic relatedness of the different cercopithecines and hominoids.

425

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 10: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

426 Djian and Green

the segment of repeats, and four were in the seven repeats of the middle region shared with the hominoids (figs. 3 and 5 ).

After divergence of the macaques from the guenons, the former generated 11 repeats in the middle region and the latter generated 12 repeats. The macaques sub- sequently deleted repeat q. The origin of a number of repeats can be traced to block duplications of preexisting repeats (figs. 3 and 4). The origin of a few repeats (1 in the cercopithecines; f, i, u, and v in the macaques; and c, d, e, j, k, z, and ae in the guenons) cannot be traced, since they are not obvious duplicates of a more ancient repeat. Repeat f of both M. fusciculuris and one polymorphic allele of M. mulatta and repeat aa of C. hamlyni were ultimately deleted. Additions and deletions in the cercopithecine segments of repeats are summarized in figure 6.

M. fascicularis M. mulatta C. aethiops C. hamlyni

I 4

- [I-3Wl I IV+IV’ - w - (0

+ (19) I

’ I - W

;g)) + (z)

11-r II’ W4-b (s) + W) - (9) + (c&e)

+ (u,v) III+III’

+Q

PLATYRRHINES

Cercopithecines - (I) + 0 I

HomPds + (S-K,V,O,7w,~)

1 I

CATARRHINES IJP

+ W,ag) or (w,x) + (ah-aj) + WT. MS)

FIG. 6.-Tree summarizing additions and deletions in segment of repeats. Entries within parentheses identify repeats. A plus sign indicates additions; a minus sign indicates deletions. When duplicate blocks can be identified, they are linked by arrows. If the relative ages of the two duplicate blocks are known, the arrow proceeds from the more ancient to the more recent. The symbols in parentheses identify repeats. When deletions are of less than an entire repeat, codon positions at which the deletion begins and ends precede the repeat symbol in parentheses. The branch leading to the cercopithecines is dashed because it is not known whether deletion of repeat 1 and addition of repeat I occurred in an ancestor of cercopithecoids or in an ancestor of cercopithecines.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 11: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

The Involucrin Gene of Higher Primates 427

Late Additions to the Gene

Of eight anthropoid species examined earlier, six possess a late region; only one hominoid (Pun) and one platyrrhine (C. albifrons) do not (Djian and Green 1989b, 1990; Phillips et al. 199 1). Although none of the four cercopithecines possesses a late region in its usual position 5’ of the middle region, repeats 19-27 of M. fascicularis are late additions within the middle region of what is probably one polymorphic form of the gene. The monkey containing it made a single involucrin of large size (Parenteau et al. 1987, fig. 1 B, second lane from left), but other members of a small M. fascicularis population sampled made involucrins of other sizes, and one of these sizes is compatible with the absence of repeats 19-27.

The e/m Extension

We previously described a group of two to five repeats located at the 5’ end of the segment of repeats and shared by different hominoids. It is now evident that these repeats are also shared by the cercopithecines and therefore should be shared by other cercopithecoids. The four or five repeats of the e/m extension of the four cercopithe- tines are shown in figure 7, where each is aligned with a repeat in the corresponding location of the hominoids and platyrrhines. The e/m extensions of the cercopithecines match not only each other but also those of Pongo pygmaeus. Matching is supported by an identical repeat pattern (BABAA) and by three to six coincident marker nu- cleotides. Of the hominoids, only Pongo pygmaeus possesses the full five repeats; several repeats have been deleted in the human and Pun, and all have been deleted in Gorilla and Hylobates. It seems clear that the common catarrhine lineage possessed an e/m region consisting of five repeats; of these, the cercopithecines retained most or all, but the hominoids usually deleted most or all.

Similarly, at the 5’ end of the segment of repeats of the platyrrhines there is a group of one to three shared repeats. These repeats are orthologous in different pla- tyrrhine species, since they form the same repeat pattern, BAB, and share marker nucleotides (fig. 7). It is not clear whether these repeats and the repeats in the same location of the catarrhines have a common origin. The penultimate repeat in two platyrrhines and the penultimate repeat in the four cercopithecines share a marker nucleotide. It seems possible that the three 5’-most repeats in the e/m extension (repeats ah-aj ) were generated in a common anthropoid ancestor while the next two repeats (af and ag) were generated more recently in a common catarrhine ancestor. If so, the former would be contemporary with the anthropoid early region, and the latter would be contemporary with the catarrhine middle region (hence the term “e/m”); but the evidence for this is slender, and it is more likely that the entire e/m extension is a separate part of the middle region and was generated independently in platyrrhines and catarrhines.

Duplicates of the e/m Extension

In the four cercopithecines and Pongo pygmaeus, repeats w/v, x/d, and y/x of the middle region match the three 3’-most repeats of the e/m extension (af, ag, and ah). As both blocks of three repeats form the pattern BAA and share two marker nucleotides (fig. 8)) one block is clearly a duplicate of the other. Since both blocks are shared by the cercopithecines and Pongo pygmaeus, the duplication must have occurred in a common catarrhine ancestor.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 12: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

Platyrrhines Cercopithecoids Hominoids A. tivirgatus S M.;saScicularis P. pygmaeus

B GAG TTC CCA GAG CAG CAG GAG GGG CAG CTG 31 B GAG CTC CCA GA CAG CAG &4G GGG CF@CTG 42aj B GAG CTC CCA GAG CAG CAG GAG CAG C@CTG 64 A mFcC CTG GAG CAG C--GAG GGG CASCTG 30 A 4CAC CTG GAG CAG CAG GAG GGG CAA CTG 41ri A AAG CAC CTAGAG CAG CAG GAG GGG CAG CTG 63 B GAG CTC CCA GAG CAG CAG GAG GGG CAG CTG 29 B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 40& B GAG CTC CCA GAG CAG CAG GAG *CAG CTG 62

A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 39q A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 61 A AAG C?@CTG m CAXCAG GAG GGG CAG CTG 38rf A AAG C?@CTG GAG CAG CAG GAG GGG CAG CTG 60

S. Oedipus M. muhatta Human B GAG XTC CCA GAG CAG CAG GAG GGG CAG CTG 30 B GAG CTC CCA GAG CAG CAG !$G GGG C@CTG 34aj B GAG CTC CCA GAG CAG CAG GAG GGG C?@CTG 39 A MCAC CTG GAG CAG GAA GAG GGG CAC CTG 29 A NCAC CTG GAG CAG CAG GAG GGG CAA CTG 33m.i A AAG CAC CTAGAG CAG CAG GAG GGA CAG CTG 38 B GAG CTC CCA GAG CAG CAr GAG GGG CAT CTG 28 B GAG CTC CCA GAG CAG CAG GAG GC@CAG CTG 32rh --- ___ --- _-_ --- --- -_- ___ -_- -_-

A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 31rg -__ ___ --- --- -__ -_- --_ __- --- -_- A AAG CmCTG GAA CAC CAG GAG GGG CAG CTG 30af --- __- --- --- ___ ___ --_ -__ ___ -_-

C. albifrons C. aethiofs P. paniscus i% B GAG ZTC CCA GAG CAG CAG GAG GGG CAA CTG 25 B GAG CTC CCA GAG CAG CA SG GGG C@CTG 31rj B GAG CTC CCA GAG CAG CAG GAG GGG C?@CTG 36 00 ___ ___ __- --_ ___ __- --- --_ ___ --- A mCAC CTG GAG CAG CAG GAG GGG CAG CTG 3Od A AAG CAC CTA GAG CAG CCG GAG GGA CAG CTG 35

--- ___ __- --- ___ __- --- ___ ___ -_- B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 29mh --- __- --- --- ___ ___ --_ -_- ___ __- ___ _-- --- --_ ___ --- ___ ___ --- -_- --- _-- __- --- ___ __- --_ -__ ___ __-

A AAG CI@CTG GAG CAZCAG GAG GX CAG CTG 28af --- _-_ ___ --- --- ___ ___ _-- ___ __-

C. ham$wi B GAG CTC CCA GAG CAG C G CAG GGG C@CTG 30aj A MCAC CTG GAG CAG CAG =G GGG CAG CTG 29ri B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 28ah

__- --- _-_ --- --- _-_ ___ -__ --- --- A AAG C?$)CTG GAG CALGAG GAG GAG CAG CTG 27rf

FIG. 7.-e/m extension. Each cercopithecine has four or five repeats in this region. The repeats of the hominoids are more frequently deleted; whereas all five repeats of the macaques are found in Pongo, only two remain in the human and in Pun, and none remain in Gorilh and in Hylobates. Repeats l-3 of the e/m extension of the platyrrhines are aligned with repeats of the cercopithecines, but this alignment is uncertain. Marker nucleotides shared by both catarrhine superfamilies are encircled, those shared only within one of the three taxa are underlined. There are no marker nucleotides shared by hominoids and platyrrhines; one marker nucleotide shared by cercopithecines and platyrrhines is boxed.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 13: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

C. aethiops M. mulatta P. pygmaeus

jB GAG CTC CCA GAG CAG CAG GAG GQCAG CTG 29 B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 32sh B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 6il a -_- __- __- --_ ___ ___ --_ _-- -__ __- A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 31ag A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 611' A AAG CJ@CTG GAG CAT GAG GAG GAG CAG CTG 28 A AAG CI@CTG GAn_ CACCAG GAG GGG CAG CTG 30af A AAG C@CTG GAG CAG CAG GAG GGG CAG CTG 62

z 'B II III ITI III III XI II II III III

B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 29y B GAG CTC CCA GAG CAG CAG GAG GC@CAG CTG 28 A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 28% A AAG CAT CTG GAG CAG CAG GAG GGG CAG CTG 271'

iA AAG C&CTG GAG CAC CAG GAG GTG CAG CTG 21 A AAG C@CTG GAqCAg CAG GAG GTG CAG CTG 27~ A AAG CJ@CTG GAG CAC CAG GAG GGG CAG CTG 26J i

C. hamlvni M. fascicularis i :B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 28 B GAG CTC CCA GAG CAG CAG GAG G@CAG CTG 4oZl -__ ___ --_ ___ ___ _-_ ___ _-- ___ ___ A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 39rgI' A AAG CJ@CTG GAG CAT GAG GAG GAG CAG CTG 27 A AAG C@CTG Ga CAT CAG GAG GGG CAG CTG 38'fJ

2 2 'g III III III II III II II II III III

B GAG CTC CCA GAG CAG CAG GAG GC@CAG CTG 37~1 A AAG CAC CTG GAG CAG CAG GAG GGG CAG CTG 36x 1~

IA AAG Cf@CTG GAG CAC CAG GAG GTG CAG CTG 21 A AAG CJ@CTG Ga CAC CAG GAG GTG CAG CTG 35~~ :

FIG. 8.-Duplicates of e/m extension. The three 3’-most repeats of the e/m extension (block I*) match three repeats in the middle region (block I) in the four cercopithecines and in Pongo pygmaeus. Marker nucleotides coincident in the two blocks of repeats are encircled. Marker nucleotides coincident only in the two blocks of the macaques are underlined. Pongo pygrnaeus possesses a late region (repeats 29-59) between the middle region and the e/m extension, but the four cercopithecines do not. Repeats 22-27 and 22-26 of the two guenons are additions to the 5’ end of the middle region in the guenon lineage (see figs. 3 and 4).

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 14: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

430 Djian and Green

Synapomorphy versus Parallelism

For use in cladistic analysis, a derived character state must have evolved in a single lineage and been transmitted to two sister taxa. When two lineages evolve a similar character state independently, the evolutionary change is described as “par- allelism” or “convergence.” Whether these two terms may be usefully distinguished has been questioned (Eldredge and Cracraft 1980, p. 7 I), but the occurrence of par- allelism has been thought to depend somehow on related ancestry (Simpson 196 1, p. 78; Mayr 1969, p. 202; Hecht and Edwards 1976; all cited in Eldredge and Cracraft 1980, p. 71).

Studies of molecular evolution of homologous genes in relation to phylogeny regularly show not only synapomorphic nucleotides but also fortuitously shared nu- cleotides discordant with phylogeny. Since there is no intrinsic means of distinguishing the two categories, shared nucleotides are assigned to one or the other according to their relative frequency (maximum parsimony, etc.); this is necessary because nu- cleotide substitutions in most genes are accidents of DNA replication and not the result of a systematic process acting on a precise target.

In contrast, the addition of repeats to the anthropoid involucrin gene is a systematic process targeted to one part of the gene. Throughout anthropoid evolution, the repeats added have been serial duplicates of the same IO-codon sequence first duplicated in the early history of that lineage. Although there has been variety in the number of repeats duplicated in a single event, and although there have also been anachronous additions, the process of repeat addition has been mainly vectorial: the oldest repeats are clustered at the 3’ end of the segment of repeats, and the most recently added repeats are clustered near the 5’ end. This order is of great value in analyzing the evolutionary changes. Moreover, the mechanism of repeat addition underlying this order has itself been transmitted throughout anthropoid evolution.

From comparisons of the repeat structure of different anthropoid species, it is clear that ( 1) there are both synapomorphic repeats and repeats acquired independently in parallel trends sustained in all sublineages; (2) because the repeats are sufficiently differentiated in sequence, it is possible to discriminate between synapomorphic repeats and the repeats that result from parallel addition; and (3) the mechanism for generating both kinds of repeats is the same.

The result of the process of repeat accumulation in the anthropoids is summarized in figure 9. In the early region, a single line rising to 10 repeats means that all 10 repeats found in anthropoid sublineages are synapomorphic. In the middle region, the lines for the platyrrhines and catarrhines separate, indicating that after the diver- gence their repeat additions are parallelisms; however, both within the catarrhines and within the platyrrhines, the repeats added up to the next divergence are synapomorphic for the resulting lineages. Finally, all late additions are parallelisms.

As figure 9 illustrates, there is considerable variation in the total number of repeats generated within each taxonomic group, and there is overlap between the different groups. But the hominoids seem to have been the most active in repeat generation: the five hominoids, four cercopithecines, and three platyrrhines have an average of 44, 32, and 30 repeats, respectively.

The inherited mechanism responsible for targeted repeat addition may itself be a derived character of the anthropoids. On the other hand, the mechanism may be older than the anthropoids, and only its directed use may be confined to the anthro- poids. There are examples in which the targeting of the mechanism within the involu-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 15: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

The Involucrin Gene of Higher Primates 43 I

70

60

50

Total Number

of 40 Repeats

10

Prosimians 1

Hominoids

early I I

middle late

-pPy WI

-60

UgoL (44

-g [ii\

-b (36) -Ati (35) Wu (33)

- Cha (27j Acal (25) -20

Additions to the Segment of Repeats FIG. 9.-Increase in size of segment of repeats during anthropoid evolution. On the right is shown the

total number of repeats in each of the 12 anthropoid species. These repeats were added, beginning early in a common anthropoid lineage, by addition of repeats of a lo-codon sequence present in prosimians in a single copy only. Repeats added to the early region are synapomorphic to all anthropoids. Repeats of the middle region were added independently in platyrrhines and catanhines, an example of parallel evolution. Repeats were also added independently in hominoids, macaques, and guenons. The e/m extension is included in the middle region. Late additions were made independently in each species. Because a number was retained for deleted repeats in the early region, the total number of repeats added in the Cercopithecus monkeys is smaller than the largest number assigned to a repeat in figs. 4 and 7. Cal 7 Cebus albifrons; Soe = Saguinus Oedipus; Atr L = A. trivirgatus, large allele; Hla = Hylobates Iar; Ppy = Pongo pygmaeus; GgoL = Gorilla gorilla, large allele; Ppa = Pan par&us; and Hsa = Homo sapiens, the largest allele known (Simon et al. 199 1). Other abbreviations are as in the legend to fig. 2.

crin gene (the location of the hot spot for repeat addition) departs from the strictly vectorial; but there are other examples in which it is strictly controlled (Simon et al. 199 1 ), The degree of precision in the targeting and the size of the duplicated blocks are important factors permitting us to distinguish synapomorphy from parallelism.

Acknowledgment

These investigations were aided by a grant from the National Cancer Institute.

LITERATURE CITED

DJIAN, P., and H. GREEN. 1989~. The involucrin gene of the orangutan: generation of the late region as an evolutionary trend in the hominoids. Mol. Biol. Evol. 6~469-471.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021

Page 16: The Involucrin Gene of Old-World Monkeys and Other Higher Primates

432 Djian and Green

-. 19896. Vectorial expansion of the involucrin gene and the relatedness of the hominoids. Proc. Natl. Acad. Sci. USA 86:8447-845 1.

-. 1990. The involucrin gene of the gibbon: the middle region shared by the hominoids. Mol. Biol. Evol. 1220-227.

-. 199 1. The involucrin gene of tarsioids and other primates: alternatives in the evolution of the segment of repeats. Proc. Natl. Acad. Sci. USA 88:5321-5325.

ECKERT, R. L., and H. GREEN. 1986. Structure and evolution of the human involucrin gene. Cell 46:583-589.

ELDREDGE, N., and J. CRACRAFT. 1980. Phylogenetic patterns and the evolutionary process. Columbia University Press, New York.

HECHT, M. K., and J. L. EDWARDS. 1976. The determination of parallel or monophyletic relationships: the proteid salamanders-a test case. Am. Nat. 110:653-677.

MAYR, E. 1969. Principles of systematic zoology. McGraw-Hill, New York. MESSING, J., and J. VIEIRA. 1982. A new pair of Ml3 vectors for selecting either DNA strand

of double digest restriction fragments. Gene 19269-276. PARENTEAU, N. L., R. L. ECKERT, and R. H. RICE. 1987. Primate involucrins: antigenic re-

latedness and detection of multiple forms. Proc. Natl. Acad. Sci. USA 84:757 l-7575. PHILLIPS, M., P. DJIAN, and H. GREEN. 1990. The involucrin gene of the galago: existence of

a correction process acting on its segment of repeats. J. Biol. Chem. 265:7804-7807. PHILLIPS, M., R. H. RICE, P. DJIAN, and H. GREEN. 1991. The involucrin genes of the white-

fronted capuchin and cottontop tamarin: the platyrrhine middle region. Mol. Biol. Evol. 8: 579-591.

PONCZ, M., D. SOLOWEIJCZYK, M. BALLANTINE, E. SCHWARTZ, and E. SURREY. 1982. “Non- random” DNA sequence analysis in bacteriophage M 13 by the dideoxy chain termination method. Proc. Natl. Acad. Sci. USA 79:4298-4302.

RHEINWALD, J. G., and H. GREEN. 1977. Epidermal growth factor and the multiplication of cultured human epidennal keratinocytes. Nature 265:42 l-424.

RICE, R. H., and H. GREEN. 1977. The comified envelope of terminally differentiated human epidermal keratinocytes consists of cross-linked protein. Cell 11:4 17-422.

-. 1979. Presence in human epidermal cells of a soluble protein precursor of the cross- linked envelope: activation of the cross-linking by calcium ions. Cell 18:68 l-694.

SIMON, M., and H. GREEN. 1985. Enzymatic cross-linking of involucrin and other proteins by keratinocyte particulates in vitro. Cell 40:677-683.

SIMON, M., M. PHILLIPS, and H. GREEN. 1991. Polymorphism due to variable number of repeats in the human involucrin gene. Genomics 9:576-580.

SIMON, M., M. PHILLIPS, H. GREEN, H. STROH, K. GLATT, G. BRUNS, and S. A. LATT. 1989. Absence of a single repeat from the coding region of the human involucrin gene leading to RFLP. Am. J. Hum. Genet. 45:910-916.

SIMPSON, G. G. 196 1. Principles of animal taxonomy. Columbia University Press, New York. TEUMER, J., and H. GREEN. 1989. Divergent evolution of part of the involucrin gene in the

hominoids: unique intragenic duplications in the gorilla and human. Proc. Natl. Acad. Sci. USA 86:1283-1286.

TSENG, H., and H. GREEN. 1988. Remodeling ofthe involucrin gene during primate evolution. Cell 54:49 l-496.

-. 1989. The involucrin gene of the owl monkey: origin of the early region. Mol. Biol. Evol. 6:460-468.

-. 1990. The involucrin genes of pig and dog: comparison of their segments of repeats with those of prosimians and higher primates. Mol. Biol. Evol. 7:293-302.

WALTER M. FITCH, reviewing editor

Received April 12, 199 1; Revision received October 1, 199 1

Accepted October 14, 199 1

Dow

nloaded from https://academ

ic.oup.com/m

be/article/9/3/417/1037258 by guest on 08 Decem

ber 2021