THE OF Vol. 267, No. 24, Issue of August 25, pp. 17404 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0...

5
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc. Vol. 267, No. 24, Issue of August 25, pp. 17404-17408, 1992 Printed in U.S.A. Novel Amino-terminal Propeptide Configuration in a Fibrillar Procollagen Undergoing Alternative Splicing* (Received for publication, March 27, 1992) Jean-Yves ExpositoS, Marina D’Alessios, and Francesco Ramirez From the Brookdale Center for Molecular Biology, Mt. Sinai School of Medicine, New York, New York 10029 We isolated overlapping cDNAs from embryonic li- braries of the sea urchin Strongylocentrotus purpur- atus coding for a fibrillar procollagen (2a chain) with a predicted molecular mass of about 320 kDa.The deduced primary structure of the echinoid chain con- sists of a 265-amino acid carboxyl-propeptide, a triple helical domainmade of 337 uninterrupted Gly-X-Y repeats, and an unusually long amino-propeptide. Aside from a 10-cysteine globular region, a collage- nous sequence, and a nonhelical segment, this protein domain includes a novel 4-cysteine motif repeated sev- eral times. Interestingly, preliminary evidence indi- cates that different combinations of the 4-cysteine re- peats areencoded by alternatively spliced transcripts. Irrespective of this, the sea urchin 2a procollagen chain represents the longest fibrillar molecule identi- fied to date by cDNA cloningexperiments in both vertebrate and invertebrate organisms. In higher vertebrates, five distinct collagen trimers (types I, 11,111, V, and XI) participate in the formation of morpho- logically similar supermolecular aggregates, the quarter-stag- gered fibrils (for a recent review, see Ref. 1). The precursor procollagen subunits of fibrillar collagens exhibit the same overall structure consisting of a central triple helical domain flanked by carboxyl-terminal and amino-terminal propep- tides. Unlike the first two domains, amino-terminal propep- tides of various procollagen chains differ greatly in length and composition. Accordingly, three distinctamino-terminal pro- peptide architectures are recognized (2). The first consists of a globular region that harbors 10 similarly spaced cysteinyl residues, a collagenous sequence that may or may not be discontinuous, and a nonhelical segment that contains the N- proteinase’ cleavage site (Structure I). All but the first sub- domain are present in the second amino-terminal propeptide configuration (Structure 11). In the third type of configuration * This work was supported by Grant GM-41849 from the National Insitutes of Health. This is article 106 from the Brookdale Center for Molecular Biology. Thecosts of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M92041. $ On leave of absence from the Institut de Biologie et Chimie des Proteines, Centre National de la Recherche Scientifique, Lyons, France. On leave of absence from the International Instituteof Genetics and Biophysics, Consiglio Nazionale delle Ricerche, Naples, Italy. The abbreviations used are: N-proteinase, the enzyme that cleaves the amino-terminal propeptide; C-proteinase, the enzyme that cleaves the carboxyl-terminal propeptide; kb, kilobase(s); bp, base pair(s); COLPZa, the gene coding for the S. purpuratus 201 procollagen. (Structure 111), the 10-cysteine subdomain is replaced by a long globular region divided by a 3-cysteine cluster into an upstream slightly basic segment and a downstream highly acidic sequence. Furthermore, one of the fibrillar procollagen chains, pro-al(II), exists in both the first and second config- uration because of alternative splicing of the sequence coding for the 10-cysteine globular region (3). In contrast to vertebrates and despite substantial morpho- logical data, very little is known about the primary structure of fibrillar procollagen molecules in invertebrates. The sole exceptions are some partial sequences from cDNAs of the fresh water sponge Ephydatia miilleri and theMediterranean sea urchin Paracentrotus lividus (4-6). Albeit incomplete, these data revealed a close evolutionary kinship between the vertebrate and invertebrate proteins. They also documented the functionalcontribution of several phylogenetically re- tained structures of the proteins to metazoan fibrillogenesis. In addition to serving as supportive elements, collagens are intimately involved in a variety of physiologicalprocesses and cellular activities (7). Suchadual role is reflected in the complexity and diversification of the molecular circuitries that modulate collagen production during development and inthe adult organism (8). The sea urchin represents an instructive and simple organism for studying collagen func- tion and regulation during early animal embryogenesis. Sea urchin collagens have been implicated in different morpho- genetic programs of the developing embryo, such as gastrula- tion and spiculogenesis (9-14). For example, expression of a nonfibrillar collagen gene (Spcoll) by the differentiating mes- enchyme cells of the sea urchin Strongylocentrotuspurpuratus has recently been shown to govern the point of differentiation at which these cells initiate biomineralization (14). Because of our long-standing interest in collagen evolution, we have recently completed the determination of the se- quences of two sea urchin fibrillar collagens, termed la and 2a chain (5, 6). Here, we present thedata pertaining to COLP2o1, the gene coding for the 2a procollagen chain of S. purpuratus. MATERIALS AND METHODS Embryo Cultures and Nucleic Acids Purification and Analysis- Collection of S. purpuratus gametes, inuitro fertilization, and embryo cultures were performed according to the standard protocol (15). Genomic DNA was purified from sperm as previously described (16). Total RNA, prepared according to a published protocol (17), was eluted twice through an oligo(dT)-cellulose column (Boehringer Mannheim). For Northern blot analysis, 1 pg of poly(A)+ RNA was fractionated through a 1% agarose gel containing 2.2 M formaldehyde, transferred onto a nitrocellulose filter (Millipore), and hybridized to a 500-bp EcoRIIKpnI probe corresponding to the 5’-foremost segment of COLP2a (Fig. 1). cDNA Cloning and Sequencing-Approximately 5 pg of late gas- trula stage poly(A)+ RNA was utilized as a template to generate two embryonic cDNA libraries in the XgtlO and hgtll vector using oligo(dT) and random primers and following the recommendations 17404

Transcript of THE OF Vol. 267, No. 24, Issue of August 25, pp. 17404 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0...

Page 1: THE OF Vol. 267, No. 24, Issue of August 25, pp. 17404 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc. Vol.

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 267, No. 24, Issue of August 25, pp. 17404-17408, 1992 Printed in U.S.A.

Novel Amino-terminal Propeptide Configuration in a Fibrillar Procollagen Undergoing Alternative Splicing*

(Received for publication, March 27, 1992)

Jean-Yves ExpositoS, Marina D’Alessios, and Francesco Ramirez From the Brookdale Center for Molecular Biology, Mt. Sinai School of Medicine, New York, New York 10029

We isolated overlapping cDNAs from embryonic li- braries of the sea urchin Strongylocentrotus purpur- atus coding for a fibrillar procollagen (2a chain) with a predicted molecular mass of about 320 kDa. The deduced primary structure of the echinoid chain con- sists of a 265-amino acid carboxyl-propeptide, a triple helical domain made of 337 uninterrupted Gly-X-Y repeats, and an unusually long amino-propeptide. Aside from a 10-cysteine globular region, a collage- nous sequence, and a nonhelical segment, this protein domain includes a novel 4-cysteine motif repeated sev- eral times. Interestingly, preliminary evidence indi- cates that different combinations of the 4-cysteine re- peats are encoded by alternatively spliced transcripts. Irrespective of this, the sea urchin 2a procollagen chain represents the longest fibrillar molecule identi- fied to date by cDNA cloning experiments in both vertebrate and invertebrate organisms.

In higher vertebrates, five distinct collagen trimers (types I, 11, 111, V, and XI) participate in the formation of morpho- logically similar supermolecular aggregates, the quarter-stag- gered fibrils (for a recent review, see Ref. 1). The precursor procollagen subunits of fibrillar collagens exhibit the same overall structure consisting of a central triple helical domain flanked by carboxyl-terminal and amino-terminal propep- tides. Unlike the first two domains, amino-terminal propep- tides of various procollagen chains differ greatly in length and composition. Accordingly, three distinct amino-terminal pro- peptide architectures are recognized (2). The first consists of a globular region that harbors 10 similarly spaced cysteinyl residues, a collagenous sequence that may or may not be discontinuous, and a nonhelical segment that contains the N- proteinase’ cleavage site (Structure I). All but the first sub- domain are present in the second amino-terminal propeptide configuration (Structure 11). In the third type of configuration

* This work was supported by Grant GM-41849 from the National Insitutes of Health. This is article 106 from the Brookdale Center for Molecular Biology. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M92041.

$ On leave of absence from the Institut de Biologie et Chimie des Proteines, Centre National de la Recherche Scientifique, Lyons, France.

On leave of absence from the International Institute of Genetics and Biophysics, Consiglio Nazionale delle Ricerche, Naples, Italy.

The abbreviations used are: N-proteinase, the enzyme that cleaves the amino-terminal propeptide; C-proteinase, the enzyme that cleaves the carboxyl-terminal propeptide; kb, kilobase(s); bp, base pair(s); COLPZa, the gene coding for the S. purpuratus 201 procollagen.

(Structure 111), the 10-cysteine subdomain is replaced by a long globular region divided by a 3-cysteine cluster into an upstream slightly basic segment and a downstream highly acidic sequence. Furthermore, one of the fibrillar procollagen chains, pro-al(II), exists in both the first and second config- uration because of alternative splicing of the sequence coding for the 10-cysteine globular region (3).

In contrast to vertebrates and despite substantial morpho- logical data, very little is known about the primary structure of fibrillar procollagen molecules in invertebrates. The sole exceptions are some partial sequences from cDNAs of the fresh water sponge Ephydatia miilleri and the Mediterranean sea urchin Paracentrotus lividus (4-6). Albeit incomplete, these data revealed a close evolutionary kinship between the vertebrate and invertebrate proteins. They also documented the functional contribution of several phylogenetically re- tained structures of the proteins to metazoan fibrillogenesis.

In addition to serving as supportive elements, collagens are intimately involved in a variety of physiological processes and cellular activities (7). Such a dual role is reflected in the complexity and diversification of the molecular circuitries that modulate collagen production during development and in the adult organism (8). The sea urchin represents an instructive and simple organism for studying collagen func- tion and regulation during early animal embryogenesis. Sea urchin collagens have been implicated in different morpho- genetic programs of the developing embryo, such as gastrula- tion and spiculogenesis (9-14). For example, expression of a nonfibrillar collagen gene (Spcoll) by the differentiating mes- enchyme cells of the sea urchin Strongylocentrotuspurpuratus has recently been shown to govern the point of differentiation at which these cells initiate biomineralization (14).

Because of our long-standing interest in collagen evolution, we have recently completed the determination of the se- quences of two sea urchin fibrillar collagens, termed l a and 2a chain (5, 6). Here, we present the data pertaining to COLP2o1, the gene coding for the 2a procollagen chain of S. purpuratus.

MATERIALS AND METHODS

Embryo Cultures and Nucleic Acids Purification and Analysis- Collection of S. purpuratus gametes, in uitro fertilization, and embryo cultures were performed according to the standard protocol (15). Genomic DNA was purified from sperm as previously described (16). Total RNA, prepared according to a published protocol (17), was eluted twice through an oligo(dT)-cellulose column (Boehringer Mannheim). For Northern blot analysis, 1 pg of poly(A)+ RNA was fractionated through a 1% agarose gel containing 2.2 M formaldehyde, transferred onto a nitrocellulose filter (Millipore), and hybridized to a 500-bp EcoRIIKpnI probe corresponding to the 5’-foremost segment of COLP2a (Fig. 1).

cDNA Cloning and Sequencing-Approximately 5 pg of late gas- trula stage poly(A)+ RNA was utilized as a template to generate two embryonic cDNA libraries in the X g t l O and hgtll vector using oligo(dT) and random primers and following the recommendations

17404

Page 2: THE OF Vol. 267, No. 24, Issue of August 25, pp. 17404 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc. Vol.

Alternative Splicing in Sea Urchin Collagen 17405

I ATG

1 : TA'G

S P S X E K S K S S B S B B S B E P K K E K B S E X E I

I I I I t I I , I ,I I " I I , , ,

1 kb 12 - 14 C7-613 c3 13 c2

F6 c7 F9 ..........

.............. FIG. 1. Partial restriction map of 2a procollagen cDNAs. Overlapping clones are depicted below a diagrammatic representation of

the protein. In the amino-terminal propeptide domain (N-propeptide), the 12 repeats (Rl-R12) are illustrated relative to the position of the encoding cDNAs. The sequences of repeats absent in clones F6 and F9 are indicated by the dotted lines. B, BamHI; E, EcoRI; K, KpnI; P, PstI; S, Sad; X, XbaI. The wavy line signifies the signal peptide. On the right side is a Northern blot of COLP2a showing multiple hybridizing bands whose approximate sizes are indicated to the left of the autoradiogram. C-propeptide, carboxyl-terminal propeptide.

P2a S M A A M P R M P Q Q Q Q S ~ G P S Q Y S H Y Y R D E I P K 1 2964 ............................ m a T A A A M P R M P V Q Q Q S K G P S Q Y S H Y Y R D E I P K

p2a T V E Q L D R T Q F Q I Y L A K F E S E I L S L I E P L G S 2994 ........................... L2a T V E D L D R T K F E L Y L A K F D N E I R S L L E P L G S

P2a R D Q P I R S C K D L F K C Y P E A E D G N Y W I D S N E G 3024 ............... ..... L2a R D Q P I R S C K D L F K C F Q R P K M A T T G S D S N E G 0 0 P2a S V K D A F L A H C V K R G E S G S P E T C I T P R V D E I 3054 ............................. L2a S I K D R F L A H C V K R G E S G S P E T C I T P R V D E I 0 0 P2a S R A R W Y E G A S G S R Y I T E M G L E K F S Y E A S E V 3084 ............. ............. L2a A P A R W Y E G A S G S R Y D L R M G L D K F S Y E A S E V

P2a Q L T F L R L L S T K A H Q a Y H C K N S V A V R D R Q 3114

L2a Q L T F L R L L S T K A H Q N V T Y H C K N S V A V H N S Q 0 P2a T G S T E Q A L R L M T T S D V E L S L D A P S Q E Q Y E V 3144

..........................

...................... ..... L2a T G S T E Q A L R L M T T S D V E L S P D A P F L E Q Y E V

P2a I - E D G C Q E R S A E W S Q T V I N Y S T R R N T R L P I 3173 ........................... 12a I C E D G C Q D R S A E L S Q T V I K Y S T R R N T R L P I U P2a V D V A P S D I G G E G Q E F G I T L G P V C F S

L2a V D V A P S D I G G V D Q E F G I T L G P V C F S ....................... 0 3198

FIG. 2. Computer alignment of P. lividus (L2a) and S. pur- puratus (P2a) carboxyl-terminal propeptides. Amino acids are numbered on the right of the sequence from the initiation site of translation (see Fig. 3); they extend from immediately after the end of the triple-helical domain (see Fig. 6) to the termination codon. Stars indicate gaps inserted to give the best alignment. The postulated cleavage site for the C-proteinase is indicated by the arrow; the cross- linking site, the putatiave glycosylation site, and the 7 cysteinyl residues are designated by the boxes.

included in a commercial ki t (Amersham Corp.). One of the libraries was initially screened under relaxed conditions of hybridization (18) with a 660-bp BglII/EcoRI fragment of clone Uni 13, which codes for the carboxyl-terminal propeptide domain of the P. liuidus 2a procol- lagen (6). Additional upstream screenings of both libraries were performed under stringent hybridization conditions utilizing appro- priate S. purpuratus cDNA segments (19). Multiple overlapping se- quencing of both DNA strands was achieved by sequencing small cDNA subclones, by generating progressive deletions with the exo- nuclease III/mungbean nuclease (U. S. Biochemical Corp.) method (20), and by employing synthetic oligonucleotides. Nucleotide se- quence was determined with a modified protocol (4) of Sanger et al. (21) using the Sequenase enzyme (U. S. Biochemical Corp.). Sequence analysis was aided by the computer program MULTALIN. Oligonu- cleotides were synthesized by an Applied BioSystems model 380 synthesizer.

RESULTS

Cloning of 2a Procollagen cDNAs-A 2.5-kb cDNA (12) was initially isolated from an S. purpuratus embryonic library by cross-hybridization with a probe (Uni 13) previously shown to code for the carboxyl-terminal propeptide domain of the P. Ziuiclus 2a procollagen (6) (Fig. 1). Comparison of the deduced amino acid sequences of I2 and Uni 13 showed a substantial degree of homology extending to the highly divergent car- boxyl-telopeptide (Fig. 2). In addition, the two carboxyl- terminal propeptides display in identical positions a potential Asn-linked glycosylation site and the 7 cysteines which, in vertebrate chains, are involved in intracellular assembly of the procollagen trimer (Fig. 2). Based on these data, we concluded that the S. purpuratus clone represents the coun- terpart of the P. liuidus 2a collagen gene product. Additional library screenings yielded eight overlapping clones covering a 9594-bp-long open reading frame (Fig. 1). As discussed more extensively below, some of the cDNAs provided evidence that COLP2a transcripts undergo alternative splicing.

Structure of 2a Procollagen-The conceptual amino acid translation of the sea urchin cDNAs revealed that this poly- peptide is unique among vertebrate and invertebrate fibrillar procollagens, for its predicted molecular mass is -320 kDa. Moreover, nearly 60% of the chain is accounted for by an unusually long amino-terminal propeptide domain character- ized by a novel subdomain positioned between the 10-cysteine globular region and the collagenous sequence (Fig. 1).

The first subdomain of the sea urchin amino-terminal propeptide, the 10-cysteine globular region, begins 38 residues after the start site of translation immediately following a characteristic signal peptide sequence (Fig. 3). A remarkable similarity can be observed when the cysteine cluster of the sea urchin chain is aligned to the same region of the human procollagen chains (Fig. 4) (3). First, the spacing between the cysteinyl residues is maintained across collagens, with the exception of the interval between cysteines 6 and 7. This particular spacing is also different in the pro-al(II1) collagen chain (Fig. 4) (22). Second, and like the human chains, the invertebrate subdomain retains several invariant residues that are also conserved in the analogous cysteine-rich motif of thrombospondin (3, 23). Third, the echinoid and human se- quences are nearly identical around the two consecutive cys- teines likely to be engaged in interchain bonding.

Following the 10th cysteine of the globular domain is the new subdomain, which is made of a 4-cysteine motif repeated 12 times (Fig. 3). With the exception of the first repeat, whose estimated PI is 8.3, the other 11 repeats are substantially

Page 3: THE OF Vol. 267, No. 24, Issue of August 25, pp. 17404 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc. Vol.

17406 Alternative Splicing in Sea Urchin Collagen

FIG. 3. Amino-terminal prepro- peptide sequence. Amino acids are numbered on the right from the start site of translation; they extend to the region immediately before the beginning of the triple-helical domain (see Fig. 6 ) . The boundaries of the four amino-terminal propeptide subdomains are designated by the open triangles. In the cysteine- rich globular region, cysteinyl residues are boxed (see also Fig. 4). In the re- peated subdomain, the boundaries and identity of each repeat are indicated by the horizontal arrows and numbers, re- spectively (see also Fig. 5). In the collag- enous sequence, Gly-X-Y triplets are continuously underlined and the 4- amino acid interruption is highlighted by the dotted lines. Arrows indicate pu- tative cleavage sites; structural elements discussed in the text are boxed.

M Y S F V D Q I R Q H R Q T L L L I L A T I T V F A V V C Q G Q E S S F S ' L S I S S G E P L L P ~ V

Y R G Q P Y L H A E S ~ S V D E ~ T I O ~ ~ D N ~ T T T O V I E S ~ Q P A F O V E P I K P E G E ~

F L ~ P F N V K V ~ K K V A P V I T S T G S I S E G R E N R L T L R V P I K F Q E A O D T T G V A G E

G L W R L S A W A S P N M D G R G Q R F G Y Q S Q T L T E A Q Q A K H Y K K K D S F N F O D V D F R

L T D P M A Q C T D M M Y I C T R L D R G E S P R T K G G L D Y E F S G F P D D N A L T G C T T A P

E C K G V ~ A R G L S W R Y T A D N I I A G M E N E L S I D A T V L F T D T T A P V T G N N L W R L

A L Y G S K N L D G S G E R F N Y N E Q T L N S M E V S K N L E E G G P L E F T E ~ T A M F D V A A

I G C G P F G Y A C M D F T R N D D A D P F F G F S V L P E G E V l T L C Q E S P C R A D ~ 3 S l T R

V Q D T V L S G T V I D G R D S N A Y S M D V A I S G A G V G V A G N G L W K L N A F G S S N A N G

A G Q R Y S E R S Q V L T S G Q Q D Q T F R V T E D M L F S N V D F D L S M R G L T C A E V Q F V C

V E F A K G D S P D T V F N L I P V P D D S V M V S C S P A E C E G V ~ 4 A R G L S W R Y T A S G I I

A G M E N E L S I T A S A V F T A T S P A K S G L D L W R L G L F G S E N E D G S G T R F N Y N V Q

T L N D V E V S K S L A A G G P L E F T E A T A M F D V A A I G C G P F T Y A C M E F A K N E D A S

P D F F F S V L P E G D V I T L C Q E S P C R A N ~ E I T N V G N R V L S G G L L N E R Y S N P F S

M E V A I R G Q G V G V A G E G L N A M S A F G S S N M N G V G Q R Y S E R N Q V L T G D Q Q D T T

1 2

5

M L V T E D M L Y A S V D F D L ~ G L T C T E V Q Y V C V E F R K G G F P T T I F N L I P V P D

P S V L V S C S P A E C E G V r ; . 6 A E D L E W T L E P V E P V F P G E E S S V S L D S T V T F R D G N

R E L V G S G L W R Q G L F G S R N R D G S G E R F N Y K R Q T L D R P Q A S T T L I A D S P L E V

Q D A I T D F E I G T V G C N D F N Y L C L E F T G G D N P N P D Y F € R V I D A M D N S A E A N T

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

m Q L W K M A V Y G S R R A D G S G P K A G L E E Q l L D P T E A S T T L L D E E N L M M N N V 1050

N F E F D M T G I R C E D A E W V C F D L D K N N R A S V N Y I F E A R P D E S V I T E C I D M R D 1100

R C K G V ~ ' A I D I D Y E A D V G D A P F G E P S P L T L T A D 1 N F D P L S P D V N G Q G L W Q L 1150

G V F A A T R P D G D G P R R D E I S Q T L D P F N G A K P L E E G G P L E F D N V L T N F P I D E 1200

L G C D D Y R Y L C V E F K Q G V A P T P G Y K F E T E A G T D S I I S C R E Q P C R G V ~ s V S E L 1250

H S Q P T E T L S D L I L Y E G K D T N P I Q Y N S V A T T T P D S G T V R G V D L W T L S Q W G S 1300

E R A N G N G P Q Q N Y Q E E V L S G Y H A A L P V M A A G D T L D F V P L A T N F D M T G L R C P

Q V K Y I C N E L S K D P R S R P D F E F T A V P D E T V L R S C F E V P D G A C K G V ~ F T D L D

W D M S H G P V S A D G P D D V R F N V D V S T L P E S G G A D G D G L W R I G V F G A Q N P Q G T

G P R L D Y K R Q I L T R G Q S S T P A E G E G M P L E L N A L E T G F D L S Q I G C D S E Y R W L

C L E F A K G L R A S P D F E F E I N G G G D V I I S C K E Q P C R R P ~ M I N D V E T N P L G N V

R V N E G T R N N R I L Y E H T A L T D P S S G K A Q G K N L W E M S T F G S S F P D G R G R R F N

P Q T A Y T F T Q Y Q K D K S A F P G E N I R Y G A V D T N M D M T G L T C N D V R Y F C S E L R K

10

11

G D Y P s P D F E M I A N P T E D v t T D c F E L N c E G v r L x D N T R L s L N s D s E L s D G p N 12

D L S F D F T V N S N P T G G D A A G N N L W R L E T F T S N N N D G S G R R D I L R T Q T L D P A

D A S Y D L D A G N T H V F R N L E A L V D S A D V N C E E D Y Y L C A E L S K H V A A S S G F S M

R A T R E N A L T S C R L I R C A K A ~ A P P P V Q ~ R P G P K G A K G E ~ ~ Q ~ ~ ~ ~ M V G P P G R

U T A A A ' Q Q S ~ G P A Y T A P I Y N T A P ~

P G L I G S V G Y H G I R G P N G L S G P A G O R G ~ D G R D G N S G N R ~ T P G P P G P P G P P G

1350

1400

1450

1500

1550

1600

1650

1700

1750

1800

1850

1900

1923

m a I K

FIG. 4. Comparison of vertebrate and invertebrate 10-cys- teine globular subdomains. Sequence alignment around the 10- cysteine cluster of the amino-terminal propeptide subdomain of the S . purpuratus 2a chain (P2a) and human al(I), al(II), al(III), and c~2(V) procollagens is shown. Cysteines (numbered in the amino to carboxyl direction) and invariant residues cited in the text are boxed. Only invariant residues were identified in this comparative analysis. Note that the ordering of the chain in the cross-alignment is arbitrary.

acidic with theoretical PI values ranging from 3.8 to 4.2. A putative cell attachment sequence (Arg-Gly-Asp) (24) and a potential glycosylation site are noted in repeats 7 and 5, respectively. The homology between the 12 repeats can be readily appreciated when portions of their sequences are aligned (Fig. 5). From this alignment, the following consensus sequence can be derived X(39)GX2LWXllGXGX39CX6CXzL/ FX(z3)CX(4)CX3 (where numbers in parentheses signify an average number of residues). A computer-aided search failed to identify appreciable homology between this consensus se- quence and known peptide motifs.

Twenty-four Gly-X- Y repeated triplets constitute the col- lagenous sequence of the amino-terminal propeptide (Fig. 3). They contain a small interruption likely to render the up- stream set of four triplets too short to participate in triple helical assembly. This subdomain is separated from the main triple helical domain by 21 residues containing a potential proteolytical signal (Ala-Gln) (25) (Fig. 3). Should the amino- terminal propeptide be cleaved, then a Lys-cross-linking site (26) would be located in the amino-telopeptide of the mature a-chain (Fig. 3).

The triple helical domain of the sea urchin 2a procollagen is made of 337 uninterrupted Gly-X- Y repeats. As previously

Page 4: THE OF Vol. 267, No. 24, Issue of August 25, pp. 17404 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc. Vol.

Alternative Splicing in Sea Urchin Collagen 17407

FIG. 5. Cross-comparison of the 12 repeats of the sea urchin amino- terminal propeptide. The 12 repeats are indicated in the left column (Rl- R12); sequences are aligned in the region displaying the highest level of homology. The length of the highly divergent amino-terminal position of each repeat is signified by X followed by the number of residues. The 4 cvsteine residues are

R1 Rz R3 R4 R5 R6 n7 .. . RE R9 R1D R11 R12

R1 R2 R3 R4 RS R6 R7 R8 R9 R10 R11 R12

G R P G S A G Y S G H R G A R G P Q G L T G P K G P Q G S A G P K G K S G P R G A R G E D G E D G N 1973 D G O N G R O G E I G L V G I S G R P G L G G K H G K S G N P G H K G W ~ R H G A P G A A G E R G 2023

shown for other invertebrate and lower vertebrate chains and unlike triple helical domains of higher vertebrates (4-6, 18), numerous Gly-X- Y triplets display a glycine residue either at position X or Y (Fig. 6). In addition, the 2a chain contains several putative cell attachment sequences but lacks the cross- linking sites normally found at both ends of the triple-helical domain (Fig. 6) (26).

The two major structural features of the carboxyl-terminal propeptide domain were already discussed in the previous section. In completing this description, a potential C-protein- ase cleavage site (Arg-Asp) is predicted to reside 25 residues from the end of the triple helical domain (Fig. 2). If this assumption were correct, the mature 2a chain would contain a second Lys-mediated cross-linking site in the carboxyl- telopeptide (Fig. 2).

Evidence for Alternative Splicing of COLP2a"Sequence analysis of several amino-terminal propeptide-encoding cDNAs suggested that COLP2a transcripts undergo alterna- tive splicing. To be precise, clone F6 spans the 5"untranslated region to the end of repeat 11 and precisely lacks the sequence of repeats 6-8. Clone F9 begins at about the same position and ends in the second third of repeat 8. The sequence of repeats 2-5 is absent in this cDNA. Clone f4 harbors only sequence coding for the repeated subdomain, from the end of repeat 3 to the very beginning of repeat 10. Finally, clone f3 begins within repeat 9, includes repeats 10-12, and ends in the first third of the triple helical domain (Fig. 1). Collectively, the four overlapping clones cover 5769 bp of contiguous se- quence coding for the 2 a amino-terminal prepropeptide (Fig. 3).

Independent evidence for COLP2a alternative splicing was

2073

2173 2123

2223 2273 2323 2373 2423 2473

2573 2523

2623 2673 2123 2773 2823 2873 2923 2934

K E V A V A D E A G F A

D

FIG. 6. Triple-helical domain se- quence. Amino acids are numbered on the right from the initiation site of trans- lation; they extend from immediately after the amino-terminal prepropeptide (see Fig. 3) to immediately before the carboxyl-terminal propeptide (see Fig. 1). Gly-Gly and Arg-Gly-Asp sequences are underlined by continuous and dotted lines, respectively.

obtained by Northern blot hybridization to a probe specific for the region immediately upstream of the repeated subdo- main of the amino-terminal propeptide. This identified at least three major hybridizing bands that range in size from approximately 8 to 10 kb (Fig. 1). Furthermore, preliminary analysis of cDNA amplified by the polymerase chain reaction technique strongly suggests the existence of additional alter- natively spliced transcripts coding for distinct combinations of the amino-terminal propeptide repeats (data not shown). We believe that some of the background seen in the Northern blot of Fig. 1 might be caused by the hybridization of these alternative and plausibly minor transcripts. Experiments in progress are elucidating the exact number, relative represen- tation, and composition of all COLP2a transcripts, along with the maximum length of the repeated subdomain.

DISCUSSION

The complete primary structure of two invertebrate fibrillar procollagens, la and 2a chains, have been deduced from sequences of overlapping cDNA clones. The sea urchin la procollagen comprises 1414 amino acids and is evolutionarily related to the vertebrate pro-aB(1) collagen (27). In contrast, the hypothetical structure of the 2a chain is unique among invertebrate and vertebrate fibrillar procollagen molecules. The primary differences are the length and composition of the amino-terminal propeptide domain, which exhibits a novel configuration, Structure IV. In addition to the three charac- teristic subdomains of Structure I, Structure IV contains a very large subdomain consisting of a novel peptide module repeated several times. Moreover, the mesenchyme cells of

Page 5: THE OF Vol. 267, No. 24, Issue of August 25, pp. 17404 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc. Vol.

17408 Alternative Splicing in Sea Urchin Collagen

late gastrula sea urchin embryos appear to produce alterna- tively spliced COLP2a transcripts theoretically encoding iso- forms with amino-terminal propeptides of different length.

This is the second case of a fibrillar collagen gene whose amino-terminal propeptide-coding domain undergoes alter- native splicing. In the vertebrate pro-al(I1) collagen, the 10- cysteine globular subdomain coding exon is in fact alterna- tively spliced in a developmentally regulated manner. Such a phenomenon was first recognized in human chondrocytes by Ryan and Sandell (3) and later confirmed in Xenopus and chicken embryos by other investigators (18, 28). The phylo- genetic retention of the developmental pattern of alternative splicing has been interpreted as suggesting functionally dis- tinct roles of the resulting pro-al(I1) products during verte- brate embryogenesis (18). It will be of interest to determine whether alternative splicing of COLPZa is also develop- mentally regulated and, more importantly, to understand what is the significance, if any, of producing different size amino-terminal propeptides.

The contribution of collagen molecules to sea urchin devel- opment is well established. In uiuo studies have documented the importance of properly formed collagen aggregates for the progression of the gastrulation process (9, 10). In vitro exper- iments have emphasized the role of the collagen substrate in promoting the biomineralization program of cultured cells (11-14). Our structural data are consistent with the proteo- lytical removal of the long amino-terminal propeptide of the 2a chain. It is, therefore, tempting to speculate that the 2a procollagen may yield two polypeptides nearly identical in size and serving distinct morphogenetic functions. Such a situation would be analogous to the postulated morphogenetic functions assigned to cleaved propeptides of vertebrate pro- collagens in cartilaginous matrices (29, 30). The availability of COLP2a clones provides a means to test this hypothesis, in that specific antibodies could be employed in interference experiments on micromers differentiating in culture (14).

The three S. purpuratus collagen genes hitherto identified by cDNA cloning experiments are all expressed in the same cell lineages, but at different ontological times (14).2 It could be argued that a common regulatory program governs critical changes in matrix composition during the differentiation of primary and secondary mesenchyme cells. Conversely, each

H. R. Suzuki, M. D’Alessio, J. Y. Exposito, R. Gambino, F. Rarnirez, and M. Solursh, manuscript in preparation.

of these genes is likely to contain distinct cis-acting elements responsible for timely onset of transcription. Experiments currently in progress are characterizing the regulatory se- quences of the three sea urchin collagen genes. These studies promise to elucidate some of the mechanisms and factors controlling collagen gene expression in the developing animal embryo.

Acknowledgments-We thank James Andriotakis for excellent technical assistance, Roseann Lingeza for typing the manuscript, and Dr. Leslie Pick for many helpful suggestions.

REFERENCES 1. van der Rest, M. and Garrone, R. (1991) FASEB J. 6,2814-2823 2. Lee. B.. D’Alessio. M.. and Ramirez. F. (1991) Crit. Reu. Eukorvotic Gene

Expression 1, iI2- i87 . . .

4. Exposito, J. V., and Garrone, R. (1990) Proc. Natl. Acad. Sei. U. S. A. 87, 3. Ryan, M. C., and Sandell, L. J. (1990) J. Biol. Chem. 266,10334-10339

fififi9-fifi73 D’Alessio, M., Ramirez, F., Suzuki, H. R., Solursh, M., and Gambino, R.

D’Alessio, M.! Ramirez, F., Suzuki, H. R., Solursh, M., and Gambino, R.

Hay, E. D. (1981) in Cell Biology of Extracellular Matrix (Hay, E. D., ed)

Ramirez, F., and Di Liberto, M. (1990) FASEB J. 4,1616-1623 Golob, R., Chetsanga, C. J., and Doty, P. (1974) Biochim. Biophys. Acta.

. . . . . . . .

(1989) Proc. Natl. Acad. Sci. U. S. A. 86,9303-9307

(1990) J. Blol. Chem. 265,7050-7054

pp. 139-156, Plenum Press, New York

349. 135-141

5.

6.

7.

8. 9.

- ”. - - - -~~ 10. Wessel, G. M., and McClay, D. R. (1987) Deu. Biol. 121,149-165 11. Blankeshi J , and Benson, S. (1984) Exp. Cell Res. 152,98-104 12. Benson, $ Smith, L., Wilt, F., and Shaw, R. (1986) J. Cell B i d . 102 ,

13. Decker, G. L., Morrill, J. B., and Lennarz, W. J. (1987) Deuelopment

14. Wessel, G., Etkin, M., and Benson, S. (1991) Deu. Biol. 148,261-272 15. Leahy P. S. (1986) Methods Cell Bwl. 27 , l -13 16. Saitta,’ B., ButticB, G., and Gambino, R. (1989) Biochem. Biophys. Res.

17. Cathala, G., Savouret, J. F., Mendez, B., West, B. L., Karin, M., Martral,

18. Su, M. W., Suzuki, H. R., Bieker, J. J., Solursh, M., and Ramirez, F. (1991)

1878-1886

(Cad. ) 103,231-247

Commun. 168,633-639

J. A., and Baxter, J. D. (1983) DNA ( N . Y.) 2,329-335

19. Sambrook, T., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A J. Cell Biol. 115,565-575

Laboratorv M a n ~ l . 2nd Ed.. DD. 9.47-9.62. Cold Sorine Harbor Labora- Coldspring HarborrNq A

20. Henikoff, S. (1984) Gene (Amst.) 28 , 351-359 21. Sanner. F.. Nicklen, S., and Coulson. A. R. (1977) Proc. Natl. Acad. Sci.

. I

22. Benson-Chanda, V., Su, M. W., Weil, D., Chu, M. L., and Ramirez, F.

23. Lawler, J., and H nes, R 0. (1986) J. Cell Biol. 103,1635-1648 24. Ruoslahti, E., anJPierschbacher, M. D. (1986) Cell 44,517-518 25. Morikawa, T., Tuderman, L., and Prockop, D. J. (1980) Biochemistry 19 ,

26. Eyre, D. R., Paz, M. A., and Gallop, P. M. (1984) Annu. Reu. Biochem. 6 3 ,

27. Exposito, J. Y., D’Alessio, M., Solursh, M. and Ramirez, F. (1992) J. Biol.

USA’ 74,5463-5467

(1989) Gene (Amst.) 78.255-265

2646-2650

717-748

28. Nah, H. D., and Upholt, W. B. (1991) J. Biol. Chem. 266,2344623452 29. Neame. P. J.. Youne. C. N.. and TreeD. J. T. (1990) J. B d . Chem. 266,

Chem. in press

~~ ~ ~

20401-20408

Biochem. J. 237,923-925

I I _ . 30. van der Rest, M., Rosenberg, L. C., Olsen, B. R., and Poole, A. R. (1986)