PNAS - in theOneofthese, ME-1,contained a2.3-kbcDNAinsert that coded for a collagenous polypeptide....

5
Proc. Natl. Acad. Sci. USA Vol. 91, pp. 4234-4238, May 1994 Biochemnstry al(XVIII), a collagen chain with frequent interruptions in the collagenous sequence, a distinct tissue distribution, and homology with type XV collagen MARKO REHN AND TAINA PIHLAJANIEMI* Collagen Research Unit, Biocenter and Department of Medical Biochemistry, University of Oulu, FIN-90220 Oulu, Finland Communicated by Elizabeth D. Hay, January 3, 1994 (received for review June 7, 1993) ABSTRACT We report on the isolation of mouse cDNA done which encode a colgenous sequence desated here as the al chain of type XVIH collagen. The overlapping clones cover 2.8 kilobases and encode an open reading frame of 928 amino acid residues comprising a putative signal peptide of 25 residues, an amino-terminal noncollagenous domain of 301 residues, and a primarily colanous stretch of 602 residues. The clones do not cover the carboxyl-terminal end of the polypeptide, since the translation stop codon is absent. Char- acterisic of the deduced polypeptide is the poion of eight nonolagous interruptions varng in length from 10 to 24 residues in the Allagenous amino acid sequence. Other features include the presence of several putative sites for both N-linked glycosylation and 0-linked glycosaminoglycan attachment and homology of the amino-terminal ncollagenous domain with thrombospondin. It is of particular interest that five of the eight coflagnou sequences of type XVIH show homology to the previously reported type XV cllagn, suggesng that the two form a distinct suroup among the diverse family of coagens. Northern blot hybridization analysis revealed a striking tissue distribution for type XVIII collagen mRNAs, as the clones hybridized trol wyith mRNAs of 4.3 and 5.3 kilobases that were present only in lung and liver of the eight mouse tissues studied. The collagens comprise a large family of heterotrimeric or homotrimeric triple-helical proteins that constitute the major structural components of the extracellular matrix. Several other proteins are known to contain short triple-helical col- lagen domains but are not classified as collagens, as they do not participate in assembly of the extracellular matrix (1-3). The vertebrate collagens can be divided into two groups, fibrillar and nonfibrillar, on the grounds of their primary structure and supramolecular assemblies (1, 2). All collagen molecules contain a central collagen domain consisting of repeating Gly-Xaa-Yaa triplets and noncollagenous domains at their termini. The fibrillar group comprises the classical collagens, types I-III, and types V and XI. These molecules contain collagenous domains of about 1000 aa, highly con- served carboxyl-terminal noncollagenous domains of about 250 aa, and variable amino-terminal noncollagenous domains of 50-520 aa. The fibrillar collagens participate in highly ordered quarter-staggered fibrils that provide tensile strength for the tissues. The nonfibrillar group comprises collagen types IV, VI-X, and XII-XVII (1-3). These molecules display great hetero- geneity in structure, tissue location, macromolecular orga- nization, and function. One common feature is that they all have one or more interruptions in the collagenous sequence. Their collagenous sequences vary in length from about 330 to 1400 aa, the shortest being found in type VI collagen mole- cules and the longest in type VII. Their carboxyl-terminal and amino-terminal noncollagenous domains also are highly vari- able in both sequence and length, the latter varying in both domains from <20 aa to several hundred amino acids. One subgroup among the nonfibrillar collagens is formed by the fibril-associated collagens with interrupted triple helices (FACMI): types IX, XII, and XIV (2). These collagens share sequence homology and do not appear to form polymers alone but are associated with fibrils composed of fibrillar collagens. Another subgroup is formed by the structurally homologous types VIII and X, which are thought to form sheets in the extracellular matrix (2). The recently described types XV (4), XVI (5), and XVII (6) differ from the other nonfibrillar collagens in being characterized by numerous interruptions in their triple-helical regions. Type XVI colla- gen shares some structural features with the FACIT colla- gens, as also does another recently characterized form called Y-collagen (7, 8). Type XVII (6), a hemidesmosomal protein also known as the 180-kDa bullous pemphigoid antigen, is unique among the collagens in that it is thought to be a transmembrane protein. Collagen types XIII and XV and the a5 chain of type IV collagen were identified in our laboratory by screening of cDNA libraries under low stringency with probes encoding collagenous sequences (4, 9, 10). Recently we screened a mouse cDNA library to obtain clones coding for the mouse counterpart to the previously characterized human type XIII collagen (9). One of the positive clones was found to encode a collagenous protein not described before.t We present here a partial characterization of this polypeptide, which is char- acterized by multiple interruptions in the triple helix, and suggest that it should be designated the al chain of type XVIII collagen. Our findings indicate that type XVIII colla- gen has an unusual tissue location. Furthermore, type XVIII was found to be homologous with type XV, and the two thus form a subgroup among the collagens. MATERIALS AND METHODS [solation of cDNA Clones and DNA Sequencing. A 500-bp clone, G2, encoding the al chain of murine type XIII collagen (unpublished results) was used as a probe to screen a mouse embryo Agtll cDNA library (ML 1027a, Clontech) under stringent conditions (11). The final wash for the filters was at 50°C in 0.5x standard saline citrate (SSC)/0.1% NaDodSO4. The recombinant phage ME-1 was isolated and the insert DNA was subcloned to the EcoRI site of pBluescript SK (Stratagene). The nucleotide sequence was determined for Abbreviation: FACIT, fibril-associated collagen(s) with interrupted triple helices. *To whom reprint requests should be addressed at: Department of Medical Biochemistry, University of Oulu, FIN-90220 Oulu, Fin- land. tThe sequences reported in this paper have been deposited in the GenBank database (accession no. L16898). 4234 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on December 29, 2020

Transcript of PNAS - in theOneofthese, ME-1,contained a2.3-kbcDNAinsert that coded for a collagenous polypeptide....

Page 1: PNAS - in theOneofthese, ME-1,contained a2.3-kbcDNAinsert that coded for a collagenous polypeptide. Rescreening of about 600,000 recombinants ofthe same library with ME-1 resulted

Proc. Natl. Acad. Sci. USAVol. 91, pp. 4234-4238, May 1994Biochemnstry

al(XVIII), a collagen chain with frequent interruptions in thecollagenous sequence, a distinct tissue distribution, andhomology with type XV collagenMARKO REHN AND TAINA PIHLAJANIEMI*Collagen Research Unit, Biocenter and Department of Medical Biochemistry, University of Oulu, FIN-90220 Oulu, Finland

Communicated by Elizabeth D. Hay, January 3, 1994 (received for review June 7, 1993)

ABSTRACT We report on the isolation of mouse cDNAdone which encode a colgenous sequence desated here asthe al chain of type XVIH collagen. The overlapping clonescover 2.8 kilobases and encode an open reading frame of 928amino acid residues comprising a putative signal peptide of 25residues, an amino-terminal noncollagenous domain of 301residues, and a primarily colanous stretch of 602 residues.The clones do not cover the carboxyl-terminal end of thepolypeptide, since the translation stop codon is absent. Char-acterisic of the deduced polypeptide is the poion of eightnonolagous interruptions varng in length from 10 to 24residues in the Allagenous amino acid sequence. Other featuresinclude the presence of several putative sites for both N-linkedglycosylation and 0-linked glycosaminoglycan attachment andhomology of the amino-terminal ncollagenous domain withthrombospondin. It is of particular interest that five ofthe eightcoflagnou sequences of type XVIH show homology to thepreviously reported type XV cllagn, suggesng that the twoform a distinct suroup among the diverse family ofcoagens.Northern blot hybridization analysis revealed a striking tissuedistribution for type XVIII collagen mRNAs, as the cloneshybridized trol wyith mRNAs of 4.3 and 5.3 kilobases thatwere present only in lung and liver of the eight mouse tissuesstudied.

The collagens comprise a large family of heterotrimeric orhomotrimeric triple-helical proteins that constitute the majorstructural components of the extracellular matrix. Severalother proteins are known to contain short triple-helical col-lagen domains but are not classified as collagens, as they donot participate in assembly of the extracellular matrix (1-3).The vertebrate collagens can be divided into two groups,fibrillar and nonfibrillar, on the grounds of their primarystructure and supramolecular assemblies (1, 2). All collagenmolecules contain a central collagen domain consisting ofrepeating Gly-Xaa-Yaa triplets and noncollagenous domainsat their termini. The fibrillar group comprises the classicalcollagens, types I-III, and types V and XI. These moleculescontain collagenous domains of about 1000 aa, highly con-served carboxyl-terminal noncollagenous domains of about250 aa, and variable amino-terminal noncollagenous domainsof 50-520 aa. The fibrillar collagens participate in highlyordered quarter-staggered fibrils that provide tensile strengthfor the tissues.The nonfibrillar group comprises collagen types IV, VI-X,

and XII-XVII (1-3). These molecules display great hetero-geneity in structure, tissue location, macromolecular orga-nization, and function. One common feature is that they allhave one or more interruptions in the collagenous sequence.Their collagenous sequences vary in length from about 330 to1400 aa, the shortest being found in type VI collagen mole-

cules and the longest in type VII. Their carboxyl-terminal andamino-terminal noncollagenous domains also are highly vari-able in both sequence and length, the latter varying in bothdomains from <20 aa to several hundred amino acids. Onesubgroup among the nonfibrillar collagens is formed by thefibril-associated collagens with interrupted triple helices(FACMI): types IX, XII, and XIV (2). These collagens sharesequence homology and do not appear to form polymersalone but are associated with fibrils composed of fibrillarcollagens. Another subgroup is formed by the structurallyhomologous types VIII and X, which are thought to formsheets in the extracellular matrix (2). The recently describedtypes XV (4), XVI (5), and XVII (6) differ from the othernonfibrillar collagens in being characterized by numerousinterruptions in their triple-helical regions. Type XVI colla-gen shares some structural features with the FACIT colla-gens, as also does another recently characterized form calledY-collagen (7, 8). Type XVII (6), a hemidesmosomal proteinalso known as the 180-kDa bullous pemphigoid antigen, isunique among the collagens in that it is thought to be atransmembrane protein.

Collagen types XIII and XV and the a5 chain of type IVcollagen were identified in our laboratory by screening ofcDNA libraries under low stringency with probes encodingcollagenous sequences (4, 9, 10). Recently we screened amouse cDNA library to obtain clones coding for the mousecounterpart to the previously characterized human type XIIIcollagen (9). One of the positive clones was found to encodea collagenous protein not described before.t We present herea partial characterization of this polypeptide, which is char-acterized by multiple interruptions in the triple helix, andsuggest that it should be designated the al chain of typeXVIII collagen. Our findings indicate that type XVIII colla-gen has an unusual tissue location. Furthermore, type XVIIIwas found to be homologous with type XV, and the two thusform a subgroup among the collagens.

MATERIALS AND METHODS[solation of cDNA Clones and DNA Sequencing. A 500-bp

clone, G2, encoding the al chain ofmurine type XIII collagen(unpublished results) was used as a probe to screen a mouseembryo Agtll cDNA library (ML 1027a, Clontech) understringent conditions (11). The final wash for the filters was at50°C in 0.5x standard saline citrate (SSC)/0.1% NaDodSO4.The recombinant phage ME-1 was isolated and the insertDNA was subcloned to the EcoRI site of pBluescript SK(Stratagene). The nucleotide sequence was determined for

Abbreviation: FACIT, fibril-associated collagen(s) with interruptedtriple helices.*To whom reprint requests should be addressed at: Department ofMedical Biochemistry, University of Oulu, FIN-90220 Oulu, Fin-land.tThe sequences reported in this paper have been deposited in theGenBank database (accession no. L16898).

4234

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 29

, 202

0

Page 2: PNAS - in theOneofthese, ME-1,contained a2.3-kbcDNAinsert that coded for a collagenous polypeptide. Rescreening of about 600,000 recombinants ofthe same library with ME-1 resulted

Proc. Natl. Acad. Sci. USA 91 (1994) 4235

both strands of the cDNA by the dideoxynucleotide method(12) using the enzyme Sequenase (United States Biochemi-cal) and vector or insert-specific primers. The same librarywas screened with the ME-1 cDNA under stringent condi-tions as above but the final wash was at 650C in 0.5x SSC.The positive recombinant phages were isolated and charac-terized as above.

Northern Blot Analysis. A mouse multi-tissue Northern blot(Clontech) containing 2 pg of poly(A)+ RNA per sampleisolated from various adult mouse tissues was hybridizedunder stringent conditions with the 32P-labeled probe SXT-5.Hybridization was carried out as suggested in the manufac-turer's protocol except that the final wash at 650C was in 0.2xSSC instead of0.1x SSC. The intactness of the RNA sampleson the blot was checked with the ,f3actin probe provided withit. The band intensities were scanned with a Biolmagedensitometer (Millipore).

Sequence Analysis. Nucleotide and amino acid homologycomparisons were carried out against the GenBank, EMBL,PIR, and Swiss-Prot databases at the National Center forBiotechnology Information with the BLAST network service(13). The search for functional patterns of amino acid se-quences was carried out with the PROSITE database (14).

RESULTS AND DISCUSSIONIsolation of Mouse cDNA Clones Encoding the al(XVIII

Collagen Chain. A 500-bp cDNA, G2, that encodes the alchain of mouse type XIII collagen (unpublished results) wasused as a probe to screen an 11.5-day mouse embryo cDNAlibrary. Five positive signals were identified among =900,000clones. One of these, ME-1, contained a 2.3-kb cDNA insertthat coded for a collagenous polypeptide. Rescreening ofabout 600,000 recombinants of the same library with ME-1resulted in the identification of 2 additional clones, SXT-1and SXT-5, with inserts of 0.6 kb and 2.8 kb, respectively.Together these 3 clones cover 2.8 kb of the correspondingmRNA sequence (Fig. 1). The nucleotide and amino acidsequences derived from them were not compatible with anyofthe previously characterized collagens I-XVII or any otherreported collagenous sequence (1-10). It is thus proposedthat the polypeptide encoded by the clones should be desig-nated the al chain of type XVIII collagen.

Partial Nucleotide and Amino Acid Sequences of the Mouseand Human al(XVI) Collagen Chains. The mouse clonesencode an open reading frame of 928 aa preceded by 20 nt of5' untranslated sequence (Fig. 2). The other reading framescontain multiple stop codons. The presumed translationinitiation codon is encoded by nt 21-23. Sequences surround-ing the codon for methionine match well with the best-conserved nucleotides (underlined) of the proposed consen-sus sequence for initiation of translation, QCC(R)CCAUjQGG(15). The amino-terminal end of the predicted polypeptidecontains a hydrophobic sequence that clearly fulfills thecriteria for a signal peptide, and on comparison with otherproteins this sequence was found to be highly homologouswith the signal peptide of decorin, the identity being 80%among the 10 residues preceding the proposed cleavage sitefor human decorin (data not shown; for decorin sequence, seeref. 16). Thus, comparisons with other proteins and predic-tion of the signal-peptide cleavage site by the method of vonHeijne (17) led to the suggestion that the al(XVIII) collagenchain has a signal peptide of 25 aa. Positions -3 and -1 areoccupied by serine and alanine, residues frequently found inthese positions (17). The presence of the signal peptidesuggests that the polypeptide is secreted into the extracellularmatrix.The putative signal peptide is followed by a 301-aa noncol-

lagenous domain that contains the only cysteine residues ofthe portion of the polypeptide encoded by the clones de-

ATG .T _

.X, ;

25 I'3 ;; 3n 1C 1Gls

NCC'i .

FIG. 1. cDNA clones encoding part of the mouse al chain of typeXVIII collagen and schematic structure of the deduced polypeptide.(Upper) The overlapping cDNA clones ME-1, SXT-1, and SXT-5and the locations of the EcoRI (E) and BamHI (B) restriction sites.The EcoRI site shown in parentheses represents a linker site intro-duced during cloning. (Lower) cDNA-derived polypeptide structure.The numbering of the noncollagenous (NC) and collagenous (COL)domains is shown below the polypeptide, and the lengths of thesedomains in amino acids are given above the polypeptide. Thenumbering of the domains begins from the carboxyl end of thepolypeptide, based on carboxyl-terminal sequence characterized byOh et al. (18). The dashed lines indicate that the clones do not coverthe carboxyl-terminal end of the polypeptide. Thus, the COL2domain is expected to be >11 aa. Dark box, signal peptide; ATG,putative translation initiation codon; stippled boxes, noncollagenoussequences; open boxes, collagenous sequences; NIS and NGS,potential N-linked glycosylation sites; GSG, potential O-linked gly-cosylation sites; C, cysteine; RGD, potential cell attachment site;tsp, thrombospondin homology area.

scribed here (Fig. 2). The rest of the sequence consists of a602-aa primarily collagenous sequence (Fig. 2). The clones donot fully cover the carboxyl-terminal end of the predictedpolypeptide, however, since the stop codon is lacking. Anotable feature of the collagenous sequence is that it containseight interruptions. The eight collagenous domains inter-spersed by the interruptions range in size from 21 to 122 aa,and five of the seven noncollagenous domains vary in sizefrom 10 to 14 aa, while two are longer ones, of 23 and 24 aa.Furthermore, the four longest collagenous domains contain atotal of five short imperfections that are due to the lack ofoneresidue of the collagenous Gly-Xaa-Yaa triplet. The colla-genous sequences are rich in proline, as this amino acidresidue represents 27% of all the residues in the Gly-Xaa-Yaatriplets. Fifty-eight percent of the prolines are in the Yaaposition and, thus, are subject to 4-hydroxylation (3). Thepolypeptide structure is presented schematically in Fig. 1,with the noncollagenous and collagenous domains numberedfrom the carboxyl-terminal end.Oh et al. (18) have independently isolated cDNAs that also

code for the polypeptide described here. The 5' sequences oftheir clones differ from the first 99 nt of our clones, whichmay indicate that the al(XVIII) gene has alternative promot-ers or that its transcripts are subject to alternative splicing.As the clones by Oh et al. (18) cover the carboxyl-terminalend of the polypeptide, it can be estimated that our clonescover only part of the penultimate collagenous domain andlack sequences corresponding to the last collagenous do-mains and the carboxyl-terminal noncollagenous domain.Types XVIII and XV Form a Subclass Within the Colagen

Family. Since the total number of the residues in the inter-ruptions is 113, 19%o of the residues in the portion of theal(XVIII) collagenous sequence described here are not lo-cated in the Gly-Xaa-Yaa repeats. The fact that it containsfrequent interruptions in the collagenous sequence means

Biochemistry: Rehn and Pihlajaniemi

Dow

nloa

ded

by g

uest

on

Dec

embe

r 29

, 202

0

Page 3: PNAS - in theOneofthese, ME-1,contained a2.3-kbcDNAinsert that coded for a collagenous polypeptide. Rescreening of about 600,000 recombinants ofthe same library with ME-1 resulted

4236 Biochemistry: Rehn and Pihiajaniemi Proc. Nadl. Acad. Sci. USA 91 (19914)

H AP R1W4 ILL D V L T L V L L L V A I V 14W A 31191 V A 3

34120

V OL LO9L L G D F L 11374I530 DD 1PK V 531 A V I F 53 PD V5 G Q V A 53

74 Y H K F X L K F R D F 40. 0 F 4 V R P A 7 1 A A 4 V L K A I T P A A 53 V V V 4 L

114 G V K L 4K9V R D 535 Q0 N ISL LV 7114F A S 53753 T G AS F R IF AKF V 5353

154 W T XF A L S V D 53534 V A L.V V D®1 9 F 53 R V P F A I A S 534 LIR LI R 4 A

194 53 L It V 53 G3A5 T A TPy X r 53 3 X6 7 4 * L V R K T P R V S P V H L D K 1 P500GCCTTOGTCGCGACGAACGAAGCAGGAGTCCGCTAGTACAACCCGTAGCTTCCGCOAGAAGT

234 P P Z P A A 4 4 P F 53 4 4 K K 3 4 4 K 4 I K H P T S 1 L P 53 L F 53 P P P V 7 4 P'720 JTAGAACGCTTGGrTGATGTTGAAGCGAGCCCAAGAAACCATCTGCCCCGCCTCTCCTCA

274840

P L A 53534771R P117117 1 PDJ AAV P475 AkR T71153744153S A W P

314 1 A960 G3AOG3CT

354 V 531040 53TCCAQ

394 0 11200 GGGAGA(

434 P 01320 CCAGGAC

4,74 53 V1440 GGTGTC(

514 P 01550 CCGGG5A(

554 0 911640 GOTATG4

5941a00

I 53 9 P 4 K 53 L I K 4 53 H K 0 53 K G3 E P 53 A 53 G P P G P A 0 P 53 G P A G 1 V

S 1 N453K V P G A 5341P PQP41G11K PG T P G R D G E POD P G ID

15 PDT 4104Q F 10714P D V G1K G K G D 1370I PRO 110113 P

1 5 r A 53 P K L 7 F I P K E [ZiiF 4 Q P I H S I R 53 P R 0 F P 0 1 P 0 P

P G L P1G E P R F G I 4YVA P G P A 0 L P10 V P0GK 13G1 P G F P G P

PPG10110K 30E P PO V A G Q K G53EV OP V 0 I P10 P K 00SK G DILG P I

1010531G A 535101G V 011011011011010P P F A A GOF P PH 1E

44I P L W 7 T A 1 5 4 D 0 L 53 P P 53 S P 0 1 K 0 P 1G V A 0 L P 0 A K 0 K V

634 G A P 0 A 53 I P G P P 0 R H 0 A A 0 5 1 53 P 1K 0 8 K 0 91 P 0 1 K 0 N P 0 K DI G1920 GACGTG(CCGGACCOTCCAGAAAGTCGTGTTCOGCAAGGGAAGAGCGAAAGG.ACAGALGTG

674 V 53 R 1 G L P 0 P P 0 P P G P V I Y V V S 4 P K A I V 5 T P 3 P E 0 K 1 0 Y A 02040 GGGCOCGCTCTGCTCGACCAGCTTACAGGCATAGTAGATOGGAGCGACGGGAGCGGAOAG

714F P 1 P A 53 P K 0 P L 53 0 K 0 K 53 0 1 P 0 P K 0 3 K 0 1 1 0 T I F 4 P P 53 K A L2160TTCTGACGTGCAAGTGCTGTCAACACA-GCTCGGCAGGAGGGAACAGATTTTGCTAGCGGTG

754 3 H F 53 X 0 A K 0 E P 53 P R 5 P P 0 P V G R P 0 H K 0 1 I 0 F P G R P G R P G I-!2240GGCTCAGAGACAGAGGCGTTCAGCCCGTCTTGCATGGAAGGGATGCTCTAGGCGTACTGAG

704 N 53 L K 0 R K 53 R 1 0 P A 5 L1 0 F S H R 0 1 P 0 P P 0 P P 0 P P 0 P P 0 M8 P I V2400 AAGCTAGGGGAGAACTGGTCACTGTCGAOGOATCTCCCCT(GCCAGCCCGTCCTGAGCACA

834 P 5 91 A F V 3 4 53 K P 0 L P 53 53 5 0 V 53 53 1 5 0 K G P K 0 E V 0 P P G P P 3. Q12020GAAC.TCTTTGTCGCACTGCJCAGCGAGTTCGGCTCGACAGGGCJAAAGTGCCCTGCACGCA

874 F P I P L F X L H A H H K 0 P K 0 PD 5 A 0 53 K 53 H R 0 E P 0 A P 0 0 53 F F 02640TTCCTGCTTCACGAGGAAGAOGCAAGAACAGGA(CGAAAGAGGGGAACTGGTCGTGGATTCC

914 S S V P 0 P P 0 1 P 0 V P 53 I2760 TCAAGT53TACCTGGCCCACCCGGOCCCACCTGGOATACCCTGG3AATTC

FIG. 2. Nucleotide and deduced amino acid sequences of cDNA clones for mouse al chain of type XVIII collagen. Noncoilagenoussequences are shaded. The N(I/G)S and GSG sequences which may serve as sites for N-linked and 0-linked glycosylation, respectively, areboxed, as is a potential cell attachment site, RGD. Arrow indicates the putative signal peptide cleavage site. Cysteine residues are circled andEcoRI sites are underlined.

that type XVIII collagen resembles three other collagenchains: al(XV), al(XVI), and al(XVII). The al(XV) chainhas a 577-an collagenous domain with 8 interruptions con-tiig33% of the collagenous-domain residues (4), al(XVI)

a 1244-aa coliagenous domain with 9 interruptions conaig15% of the collagenous residues (5), and al(XVII) an 86acollagenous domain with 12 interruptions hosting 361% of theresidues (6). The human al(XV) chain has been reported toconsist of nine collagenous domains, termed here COL9-COMl (numbered from carboxyl terminus to amino termi-nus), with sizes of 18, 114, 35, _.U, 71, IQ, la~, 55, and 15 an,

respectively (4). The four extreme carboxyl-terminal colia-genous domains fully covered by the mouse al(XVIII) clonesare 42, 73, 33, and 21 an, [COL6-COL3 in Fig. 1, numberedfrom the carboxyl-terminal end of the polypeptide as sug-gested by Oh et al. (18)], being thus similar in size to the fourunderlined coilagenous domains of the al(XV) chain. Closercomparison indicates that these four coilagenous domains ofthe al(XVIII) and al(XV) chains are homologous in theiramino acid sequences (Fig. 3), this homology being mostnotable between the 71-residue COL5 domain of al(XV) andthe 73-residue COLS of al(XVIII), with 59% identity. The

kmu(IV) ,CO1s 76

mo(ZVII),CO1A 462

bm(ZV).001.4 280

mo(ZWIZX),CO366 704

bu(IV).001.4 441

mo(ZVIUI).0014 842

GPF5GPDG3LPGLPGFP53PROPKGDTGLP53FFGLK0DQG3KGIPGI hu(ZV) .001. 336

G3P1GKPG;TAGFPGPAGFKGDLGIKGKQGLPGPK---GKKGEPGTI no(1W:!I),.001. 759

GVK53HKGSM5LP53SKGEKGDQG3AQG3PPGPP

G3RPGLPG3QQ5VQ53FSGPKGDKG3IVGPFPGPPGNHF

GKCKOGHPHGAP0PHGPKOPPOHKGHPOLPORPORGCLNOLKGTK0DPCVDIGPP--GLPOPPOPA

GAIOBPGPR0PPGPTGRPOHXGEIOFPORPORPGTNG3LKGDKGEPG'DAB1.53SOMLPGP GPGPCIP

hu(ZV) .0013 485 5GMDKGFKG3K53KKGDI

mo(ZVZZZ),.0013 887 GDKGDItODAGQKO3G3WP53A

FIG. 3. Comparison of homologous coliagenous sequences between mouse al(XVIII) and human al(XV) chains. The number of thecollagenous (COL) domain alligned is indicated for both chains, the numbering beginning from the carboxyl termini of the chains. The gapsindicated by dashes were introduced for maximal alignment of the chains. The numbers of amino acid residues are given by counting from theinitiation methionine for the al(XVIII) chain (see Fig. 2) and from the first amino acid residue reported for the partially characterized al(XV)chain (4). The aligned domains are similar in size and, to varying extents, homologous in amino acid sequence. Two of the aligned pairs alsocontain similarly located short imperfections (indicated in bold type) in the repeating Gly-Xaa-Yaa sequences.

GPPGPPGPPGLPGIPGKPGTDVP--MGPPGSPMWPAGEFGPFGPBGQPGVDGATGLPCMIKGZKGhRGPNGSVGZKGDPGMtGLPGPPGEKGQAGPFGVNGPFGPPGPPGPPGIG

Dow

nloa

ded

by g

uest

on

Dec

embe

r 29

, 202

0

Page 4: PNAS - in theOneofthese, ME-1,contained a2.3-kbcDNAinsert that coded for a collagenous polypeptide. Rescreening of about 600,000 recombinants ofthe same library with ME-1 resulted

Proc. Natl. Acad. Sci. USA 91 (1994) 4237

collagenous domain of the al(XV) chain preceding the fourunderlined domains (see above) is clearly different both insize, 35 aa, and in amino acid sequence from the correspond-ingly located 83-aa COL7 domain of the al(XVIII) chain.Interestingly, the next collagenous domain of the al(XV)chain, COL8, stands out again as being similar in size, 114 aa,to the 122-aa COL8 domain of al(XVIII). Alignment ofthesecollagenous domains in the two chains shows identity mainlyin sequences that involve Gly-Pro-Pro repeats at both ends ofthe domains (Fig. 3). Repeats of Gly-Pro-Pro triplets arecommonly found in collagen chains adjacent to noncollage-nous domains and thus do not necessarily point to a closeevolutionary relationship between these chains. The al(XV)COL8 and al(XVIII) COL8 domains nevertheless possess ashort imperfection in identical locations (Fig. 3), suggestingthat these domains are indeed homologous. The homologousCOL5 domains of the al(XV) and al(XVIII) chains alsocontain similarily located imperfections, suggesting that con-servation ofthe imperfections is functionally implicated (Fig.3).The al(XVIII) chain contains one more of the collagenous

domains at the beginning of the collagenous sequence thanthe al(XV) chain. The extreme amino-terminal collagenousdomain of the al(XV) chain does not correspond in eithersize or sequence to either of the extreme amino-terminalcollagenous domains of the al(XVIII) chain, COL9 andCOL8. It is thus not possible to fully align the two homolo-gous polypeptides, indicating that they cannot representdifferent a chains of the same collagen type. The type XVcollagen has hitherto been characterized via cDNA clones,and its function is not known (4), nor has it been found to behomologous with the FACIT subgroup of collagens or any ofthe other previously reported nonfibrillar collagens. Thus theamino acid sequence homology between collagen typesXVIII and XV indicates that they represent a subfamilywithin the heterogeneous family of collagens.Thrombospondin Homology and Multiple Potential Glyco-

sylation Sites in the al(XVIII) Polypeptide. Homologysearches against protein databanks showed the al(XVIII)polypeptide to be homologous to a large amino-terminalsegment of thrombospondin (Fig. 4), a multifunctional gly-coprotein with affinity for several molecules (24). Thisv200-aa noncollagenous segment has previously been iden-tified in the amino terminus of collagen types V, XI, and IXand has been found to be embedded in the large noncollag-enous amino-terminal domain of collagen types XII and XIV(22, 23). Furthermore, a proline- and arginine-rich protein[PARP, which may represent a fragment of the a2 chain oftype XI collagen (25)] has been found to contain this module(22). This sequence represents the amino-terminal heparin-binding domain of thrombospondin (24). The positionsthought to be involved in heparin binding are not, however,

conserved in any ofthe previously described collagens (22) orin the type XVIII collagen chain described here (Fig. 4). Thusthe significance of this thrombospondin homology in thevarious collagen chains is unknown.A search for structural motifs in the al(XVIII) polypeptide

sequence led to the identification of two putative sites forN-linked glycosylation, an Asn-Ile-Ser sequence in the NC11domain and an Asn-Gly-Ser sequence near a short interrup-tion in the COL8 domain (Figs. 1 and 2). Additional putativeglycosylation sites were located in the NC9 and NC8 domains(Figs. 1 and 2) in the form of two sequences that conform tothe consensus sequence [(Asp/Glu)-Xaa-Glu-Gly-Ser-Gly-Ser-Gly-Xaa-Leu] for O-linked glycosaminoglycan attach-ment in a number of proteins (26). Interestingly, theseputative NC9 and NC8 glycosylation sites were identical insequence for 6 aa, Asp-Met-Glu-Gly-Ser-Gly. As this se-quence represents the only internal homology among theal(XVIII) chain interruptions, the sequence conservationmay provide further evidence for utilization of these twosequences in glycosaminoglycan attachment. Putative glyco-sylation sites that conform less well to the consensus se-quence also exist, particularly in the NC11 domain. Thepossibility of type XVIII collagen containing a glycosamino-glycan side chain is supported by recent findings indicatingthe existence of such side chains in several collagens. Morespecifically, the FACIT collagens IX, XII, and XIV havebeen shown to contain a glycosaminoglycan side chain (27-29). Type XV collagen also contains multiple putative sitesfor both N- and 0-linked glycosylation (4), further highlight-ing the similarity between collagen types XVIII and XV.Searches for other biologically significant sequence motifsrevealed that the COL3 of al(XVIII) contains one Arg-Gly-Asp sequence that may play a role in cell attachment (30).This sequence is not found in the corresponding homologouscollagenous domain in type XV, however.

Restricted Tissue Distribution ofType XVIII Coilagen Tran-scripts in Mouse. When a Northern blot containing poly(A)+RNA isolated from mouse brain, heart, kidney, liver, lung,skeletal muscle, spleen, and testis was hybridized with the2.8-kb probe SXT-5, a clear hybridization signal was visiblewith lung and liver mRNA after only 3 hr of autoradiographicexposure (data not shown). With both tissues the probehybridized to a major 4.3-kb transcript and a somewhat lessabundant 5.3-kb transcript, whereas these bands were absentfrom the mRNAs isolated from the other tissues, even aftera prolonged exposure (Fig. 5). The major 4.3-kb transcriptcomprised 63% and 74% of the type XVIII collagen tran-scripts in the lung and liver tissue, respectively. Two faintbands of 3.8 kb and 4.7 kb, clearly differing in size from thestrong lung and liver signals, were seen in all samples exceptthe heart and skeletal muscle RNAs (Fig. 5). It is possible thatthe 3.8- and 4.7-kb bands may be low-abundance alterna-

throuibospondin, human, N-term. 16 CGTNRIPESGGDN. SVFDIFELTGAA.RKGSGRRLVKGPDPSSPAFRIEDANLIPPVcollagen al(V), human, N-term. 12 APPPSRAAQPADLLKVLDFHNLPDGITKTTGFCATRRSSKGPDVAYRVTKDAQLSAPcollagen Ol(IX), humn, N-term. 44 CPKIRIGQDDLPGFDLISQFQVDKA ..ASRRAIQRVVGSATLQVAYKLGNNVDFRIPcollagen al(XVIII), mouse, N-term. 24 WAEPENVAEEVGLLQLL.GDPLPE KISQIDDPHVG ..... PAY.. IFGPDS

71 PDDKFQDLVDAVRTEKGFLLLASLRQMKKTRGTLLALERKD ... HSQQVFSVVSN . GKAGTLDLSLTVQ GKQHVVSVEE ... ALLATUQWK69 TKQLYPASA.... FPEDFSILTTVKAKKGSQAFLVSIYNEQ ... GIQQIGLEL ... GRSPVFLYE. DHT GKPGPEDYPLFRGINLxXKWH99 TRNLYPSG ... LPEEYSFLTTFRMTGSTLKKNWNIWQIQDSSGKEQVGIKINGQTQSWFSYK.GLD GSLQTAAFSN. LSSLF[)SQWH67 NS(3QVAQYHFPKLFFRDFL LLFHVRPATEAAGVLFAI . TDAAQVVVSLGVKLSEVRDGQQNISLLYTEP GALrQTQT3AS,-FRL. PAFV(::QWT

15 ' SITLFVQEDRAQLYIDCEKMENAELD)VPIQSVFTRDLASIARLRIAKG(VNDNFQGVLQNVRFVF(TTPEDILRNKGC. SSSTS149 RIALSVHKKNVTLILDCKKKTTKFLDRSDHPMIDINGIIVFGTRILDEEVFEG. .DI.QQLLFVSDH ... RAAYDYCEHYOPDC180 KIMI(CVERSSATLFVD)CNRIESLPIKPRCP. . IDIEDGFAVLGKLADNPQVSVPF. EL . QWMLIHCDPL ... .RPRRETCHELPARI15.6 HFALS:VEXI(CSVALYVDCEEFQRVPFARASQ(,-. LELERGAGLFVGQAGTADPDKF.... QCMISELKVRKTPRVSPVHCLDEEDD[)[)

FIG. 4. Thrombospondin homology in the amino-terminal noncollagenous domain of the mouse al(XVIII) chain. The al(XVIII) sequenceis aligned here with thrombospondin and the al(V) and al(IX) collagen chains, the numbering indicating the number of amino acid residues ineach polypeptide (19-24). The homologies of the al(V) and al(IX) collagen chains and certain other collagens with thrombospondin have beenreported previously (22, 23). The conserved amino acid residues previously identified to be identical in thrombospondin and other matrix proteins(22) are indicated in bold type. The residues in the al(XVIII) sequence that are identical to one or more of the other polypeptides shown hereare marked with stars. The identified amino-terminal heparin binding sites in thrombospondin (24) are marked with bars.

Biochemistry: Rehn and Pihlajaniemi

Dow

nloa

ded

by g

uest

on

Dec

embe

r 29

, 202

0

Page 5: PNAS - in theOneofthese, ME-1,contained a2.3-kbcDNAinsert that coded for a collagenous polypeptide. Rescreening of about 600,000 recombinants ofthe same library with ME-1 resulted

4238 Biochemistry: Rehn and Pihlajaniemi

N\\1 N,-~\", \,~

",~

7.§ kb14.4 k

2.4 k )

Fio. 5. Northern blot analysis of al(XVIII) colagen mRNAs inmouse tissues. Each lane contained 2 jug of poly(A)+ RNA from theadult mouse tissues indicated. The blot was hybridized to a mouseal(XVIII) collagen clone. Autoradiography time was 18 hr.

tively spliced forms of the p1(XVIII) mRNA. Another ex-planation, however, may be that they are transcripts of adifferent gene. The same blot was also hybridized with a

P-actin probe to confirm the intactness of the mRNAsisolated from each tissue. Strong hybridization signals wereobtained from each tissue, with no sign of degradation of theRNAs (data not shown), thus excluding the possibility thatthe lack of al(XVIII) transcripts in most tissues could be anartifact.Con s. The mouse clones described here code for a

unique polypeptide, designated as the al chain of type XVIIIcollagen. Altogether, 928 aa were determined, including asignal peptide of 25 aa, an amino-terminal noncollagenousdomain of 301 aa, and a 602-aa stretch of a collagenousregion. Type XVIII collagen was found to resemble type XVcollagen (4) in containing multiple interruptions and imper-fections in the collagenous sequences and in having severalsequences that may serve as sites for N- and O-linkedglycosylation. Several of the variable-length collagenousdomains ofthe two types were found to be similar in both sizeand sequence, leading to the suggestion that collagen typesXVIII. and XV form a subclass within the large family ofcollagenous proteins.The amino-terminal noncollagenous domain of type XVIII

collagen contained an -200-aa sequence that was homolo-gous to thrombospondin. It has been reported that collagentypes V, IX, XI, XII, and XIV contain this sequence module(22, 24), and we found it to be the only homology betweentype XVII and the other collagens except for type XV. Thus,molecules belonging to different subclasses of collagens-i.e., the fibrillar and FACIT collagens, and also some othercollagens,-share this sequence module, although its func-tional significance in collagens is not known. Of interest isthat the two cysteine residues that are conserved within thissequence in all collagens are known to form a disulfide bondin a proline- and arginine-rich protein, PARP (25). It thusseems likely that the only two cysteines found in the amino-terminal noncollagenous domain of the al(XVIII) chain alsotake part in disulfide bond formation.Type XVIII collagen mRNAs had a striking tissue distri-

bution, as demonstrated by the clear Northern signal in liverand lung RNA but not in brain, heart, kidney, skeletalmuscle, spleen, or testis RNA. Further research will berequired, however, to obtain a complete picture ofthe pattern

of expression of this collagen. The present finding of markedamounts of mRNAs only in liver and lung among the eightmouse tissues studied already justifies the suggestion thattype XVIII collagen mRNAs have a distinct tissue distribu-tion that is not similar to that of any of the previouslydescribed collagens.

We thank Aila Jokinen and Jaana Vaisinen for expert technicalassistance. This work was supported by grants from the MedicalResearch Council of the Academy of Finland and the Sigrid JuseliusFoundation.

1. Vuorio, E. & de Crombrugghe, B. (1990) Annu. Rev. Biochem.59, 837-872.

2. van der Rest, M. & Garrone, R. (1991) FASEB J. 5, 2814-2823.3. Kivirikko, K. I. (1993) Ann. Med. 25, 113-126.4. Myers, J. C., Kivirikko, S., Gordon, M. K., Dion, A. S. &

Pihlajaniemi, T. (1992) Proc. NatI. Acad. Sci. USA 89,10144-10148.

5. Pan, T.-C., Zhang, R.-Z., Mattei, M.-G., Timpl, R. & Chu,M.-L. (1992) Proc. Nati. Acad. Sci. USA 89, 6565-6569.

6. Kehua, L., Tamai, K., Tan, E. M. L. & Uitto, J. (1993) J. Biol.Chem. 268, 8825-8834.

7. Yoshioka, H., Zhang, H., Ramirez, F., Mattei, M.-G., Moradi-Ameli, M., van der Rest, M. & Gordon, M. K. (1992) Genomics13, 884-886.

8. Myers, J. C., Sun, M. J., D'Ippolito, J. A., Jabs, E. W., Neil-son, E. G. & Dion, A. S. (1993) Gene 123, 211-217.

9. Pihlajaniemi, T. & Tamminen, M. (1990) J. Biol. Chem. 265,16922-16928.

10. Pihlajaniemi, T., Pohjolainen, E.-R. & Myers, J. C. (1990) J.Biol. Chem. 265, 13758-13766.

11. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) MolecularCloning: A Laboratory Manual (Cold Spring Harbour Lab.Press, Plainview, NY), pp. 2.1-2.125.

12. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natd.Acad. Sci. USA 74, 5463-5467.

13. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman,D. J. (1990) J. Mol. Biol. 215, 403-410.

14. Bairoch, A. (1992) Nucleic Acids Res. 20, 2013-2018.15. Kozak, M. (1991) J. Biol. Chem. 26, 19867-19870.16. Krusius, T. & Ruoslahti, E. (1986) Proc. Natl. Acad. Sci. USA

83, 7683-7687.17. von Hejne, G. (1986) Nucleic Acids Res. 14, 4683-4690.18. Oh, S. P., Kamagata, Y., Muragaki, Y., Timmons, S.,

Ooshima, A. & Olsen, B. R. (1994) Proc. Natl. Acad. Sci. USA91, 4229-4233.

19. Lawler, J. & Hynes, R. O. (1986) J. Cell. Biol. 103, 1635-1648.20. Takahara, K., Sato, Y., Okazawa, K., Okamoto, N., Noda, A.,

Yaoi, Y. & Kato, I. (1991) J. Biol. Chem. 266, 13124-13129.21. Muragaki, Y., Kimura, T., Ninomiya, Y. & Olsen, B. R. (1990)

Eur. J. Biochem. 192, 703-708.22. Bork, P. (1992) FEBS Lett. 307, 49-54.23. Wilchli, C., Trueb, J., Kessler, B., Winterhalter, K. H. &

Trueb, B. (1993) Eur. J. Biochem. 212, 483-490.24. Adams, J. & Lawler, J. (1993) Curr. Biol. 3, 188-190.25. Neame, P. J., Young, C. N. & Treep, J. T. (1990) J. Biol.

Chem. 265, 20401-20408.26. Bourdon, M. A. (1990) in Extracellular Matric Genes, eds.

Sandell, L. J. & Boyd, C. D. (Academic, Orlando, FL), pp.157-174.

27. McCormick, D., van der Rest, M., Goodship, J., Lozano, G.,Ninomiya, Y. & Olsen, B. R. (1987) Proc. Natl. Acad. Sci.USA 84, 4044-4048.

28. Bruckner, P., Vaughan, L. & Winterhalter, K. H. (1985) Proc.Natl. Acad. Sci. USA 82, 2608-2612.

29. Watt, S. L., Lunstrum, G. P., McDonough, A. M., Keene,D. R., Burgeson, R. E. & Moms, N. P. (1992) J. Biol. Chem.267, 20093-20099.

30. Ruoslahti, E. (1991) J. Clin. Invest. 87, 1-5.

Proc. Natl. Acad. Sci. USA 91 (1994)

Dow

nloa

ded

by g

uest

on

Dec

embe

r 29

, 202

0