J. Biol. Chem.-1984-Uhlén-1695-702
-
Upload
venkata-suryanarayana-gorle -
Category
Documents
-
view
216 -
download
0
Transcript of J. Biol. Chem.-1984-Uhlén-1695-702
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
1/8
THE
OURNALF BIOLOGICAL CHEMISTRY
0 1984by The American Society of BiologicalChemists, Inc
Vol.
259,No.
3,
ssue
of February
pp. 1695-1702,1984
Printed in U.S.A.
Complete Sequence of the Staphylococcal Gene Encoding rotein A
A GENE EVOLVED THROUGHMULTIPLE DUPLICATIONS*
(Received for publication, August 4, 1983)
Mathias Uhlen QlI, Bengt GussQ, Bjorn NilssonSTi,ten Gatenbeck , Lennart hilipsonQII and
Martin Lindberg **
From the Department
of
Biochemistry, Royal Institute of Technology, S 100
44
Stockholm, Swedenand the Department of
Microbiology, University of Uppsala,
The
Biomedical Center, Box
581,
S 751 3 Uppsala, Sweden
The gene coding for proteinA from
Staphylococcus
aureus
has been isolated by molecular cloning, and a
subclone containing an 1.8-kilobase insert was found
to give a functional proteinA in
Escherichia coli.
The
complete nucleotide sequence of the nsert, including
thestructuralgeneand he 5 and 3 flanking se-
quences, has been determined. Starting from a TTG
initiatorcodon,anopenreading ramecomprising
1527 nucleotidesgives a preprotein of 09 amino acids
and a predicted M
=
58,703. The structural gene is
flanked on both sides by palindromic structures fol-
lowed bya stretch ofT residues, suggesting transcrip-
tional termination signals. Thus, it appears that pro-
tein A is translated froma monocistronic mRNA.
The sequence reveals extensive internal homologies
involving a 58-amino acid unit, responsible for IgG
binding, repeated 5 times and an 8-amino acid unit,
possibly responsible for binding o the cell wall of
S.
aureus
repeated 12 times. Comparisons between the
repeated regions show a marked preference forsilent
mutations, indicating an evolutionary pressure to keep
the amino acid sequence preserved. The structure of
the gene also uggests how the gene has volved.
Evolution by gene duplication is a well known phenomenon
among eukaryotic genes. The globin clusters, the immuno-
globulins, and the nterferon genes probably all have ancestral
genes which have been duplicated and hen diverged into
functionally distinct genes (1). Examples of internally, repet-
itive sequences have also been reported; rabbit skeletal tro-
pomysin contains a 7-residue amino acid periodicity through-
out the molecule (2), andsimilar repeats have been reported
for chicken fibronectin (3 ) and mammalian serum albumin
(4). Among prokaryotes, most reports of duplicated genes
have involved
in
vitro
constructions (5), which seem to be
stable in
Escherichia
coli, but dramatically unstable n
Bacillus
subtilis
(6).
However, the amino acid sequences of a few cell
wall-bound proteins from Gram-positive bacteria have re-
vealed remarkable periodicity, i.e. staphylococcal protein A
(7,8) andstreptococcal M protein (9).
We have earlier reported on the molecular cloning of the
* The costs of publication of this article were defrayed in part by
the payment of page charges. This article must therefore be hereby
marked advertisement in accordance with
18
U.S.C. Section 1734
solely
to
indicate this fact.
T Supported by grants from the Swedish National Board for Tech.
nical Development.
Present address, European Molecular Biology Laboratory, Hei.
delberg, Federal Republic of Germany.
** Supported by grants from the Swedish Medical Research Coun-
cil and Pharmacia Fine Chemicals, Uppsala.
gene for staphylococcal protein A in
E .
coli (10). This protein
interacts with the F, (constantpart of immunoglobulins)
domain of several immunoglobulins from many species in-
cluding man an d has herefore been used extensively for
quantitative and qualitative immunological techniques (11).
Amino acid sequence analysis of proteinA revealed two
functionally distinct regions of the molecule
(7,
8). Both
regions have remarkably repetitive structures.
The NH2-terminal part contains four or five homologous
IgG-binding units consisting of approximately 58 amino acids
each. The COOH-terminal par t which is thought to bind to
the cellwall of Staphylococcus
aureus
consists of several
repeats of an octapeptide (Glu-Asp-Gly-Asn-Lys-Pro-Gly-
LYS) 8).
In a previous report
( l o ) ,
we determined the nucleotide
sequence of the promoter region, as well as the egion coding
for
the NH2-terminal part of the protein. Here we report the
complete nucleotide sequence of the protein A gene including
the 5 and 3 flanking regions from the
S.
aureus strain 8325-
4. Thestructural gene is 1,527 nucleotides long giving a
preprotein consisting of 509 amino acids and a
M ,
= 58,703.
The repetitive structure of the gene has been clarified which
suggests how the gene has evolved.
EXPERIMENTALPROCEDURES
Bacterial Stra ins and Plasmids-E. coli strains HBlOl (12) and
pBR322 (14), TR262 (15), and pEMBL9 (16).
GM161 (13) were used as bacterial hosts. The plasmid vectors were
DNA Preparations-Plasmid DNA was prepared by the alkaline
extraction method (17). Transformation of E. coliwasmade as
described by Morrison 18). Restriction endonucleases, T4 DNA
ligase (New England Biolabs), alkaline phosphatase, and T4polynu-
cleotide kinase (Boehringer-Mannheim) were used according to the
suppliers recommendations.
Isolation of the 2.15-kilobase DNAfragment containing the entire
protein A gene was made by digesting the plasmid pSPA3
(10)
with
EcoRV. The digested material was electrophoresed on a 5% polyac-
rylamide gel, and the 2.15-kilobase fragment was eluted electropho-
retically. The isolated fragment was passed over an anion exchange
column, eluted, and precipitated with ethanol. The precipitated ma-
terial was washed in 80% ethanol, dried, resuspended in water, and
used for DNA sequence analyses.
DNA Sequencing Determinutions-DNA fragments were se-
quenced by the method of Maxam and Gilbert (19) or Sanger et al.
(20). The samples were analyzed on 6, 8, and 20% denaturing poly-
acrylamide gels using the thermostatic LKB Macrophor system.
Computer Anulysis-All the sequencing analyses were performed
on a Hewlett-Packard desktop computer (HP-85) equipped with a
HP7225A plotter. The software was constructed by M. Uhlen.
RESULTSANDDiSCUSSION
D N A Sequence-We have earlier reported that theprotein
A gene from S aureus strain 8325-4 is located ona 1.8-
kilobase insert of staphylococcal DNA cloned in the plasmid
1695
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
2/8
1696
D N A Sequence of Staphylococcal Protein
A
pSPA8
\ \
e t I
fl
FIG. 1. Structure of plasmid pSPA8 with relevant restric-
tion sites. The protein A gene is contained in a
1.8
kilobase
TuqI-
EcoRV insert in the plasmid pBR322.
Boxes
show the positions of
the replication origin O R 0 and the enes coding forprotein A
PROT
A )
and p-lactamase
( A M P ) .
B=
0
kb
C.
ToqI
E c o R V
Bcl
P s t I
HlndI
I
S o u 3
Rea1
EcoRI
FIG.
2.
Restriction map
and
sequencing strategy
of
the in-
sert. A schematic drawing of the gene coding for protein A with its
different regions.
S
is a signal sequence,
- D
are IgG-binding regions,
E is a region homologous to A-D and X is the COOH-terminal part
of protein A which lacks IgG-binding activity. B , partial restriction
map of the corresponding DNA sequence.
C,
sequencing strategy of
the 1.8-kilobase insert.
pBR322 (21) .Theplasm id was designatedpSPA8and s
shown schematically n Fig. 1. Expression of the gene was
demonstrated in E . coli. Th e sequence of the prom oter region
and the 5 ' end of the structural gene has been reported (10)
as well
as
th e sequence of the epetitiv e region
X
which
probably
is
responsible for the ell wall binding of the pr ote in
in S . aureus.
Using the stra tegy outl inedn Fig. 2C, the entire insert as
sequenced according to the meth od of M axam and Gilbert
(19). It was not possible to obtain sequence on both strand s
in all parts of th e gene, and the refor e additio nal sequencing
using the enzym atic m ethod (20, 16) was performed in order
to confirm th e sequence in these parts. As no palindrom ic
sequence indicat ing transcript ion termination was found in
the 3' end f th e gene, th e sequence
a
few hund red nucleotides
downstream from the EcoRV si teon he originalplasmid
pS P Al (10 )was determined using both method s (19,20). The
complete nucleotide sequence of the prote in A gene is shown
in Fig. 3 . No te hat he previouslypublishedsequence of
Lofdahl et al. (10) lacks one of the three thym idines atosition
183-185.
Guss, B., Uhlh,
M.,
Nilsson,
B.,
Lindberg,
M.,
Sjoquist,
J.,
and
Sjodahl, J. (1984)Eur.
J.
Biochem., in press.
Start ing from
a
TT G c odon
at
nucleotide 184, the re i s an
open reading framef 1,527 nucleotides term inating in TAG
stop codon at nucleotide 1,711. The prep rotein , ncluding the
putative sign al peptide, con sists of 509 amino acids giving
a
M , = 58,703. Although we have not shown th at th e codon at
nucleotide 184 is the ran slati on al star t, her e are several
reasons to postulate this . First , TTG iscommon s tart codon
in G ram-posit ive bacteria (21), unlike E. coli in w hich it is
very rare (22).Second, th is sta rt odon gives a putative signal
peptide with a reasonable size (36 am ino acids) and structu re
(a few basic residues followed by a stre tch of 23 hydrophobic
residues). Third, this codon
is
preceded by a possible Shin e-
Dalgarno sequence (23) that has man y features in common
with other Gram -positive ribosomal binding sequences (24).
8
out of 11 nucleotides are complementary to the
'
end of B.
subtilis
16 S rRNA, similar to other Gram -posit iveenes (25).
In addit ion, the pace between the lastG in this equence and
the start codon is sevennucleotides, also sim ilar o other
Gram -positive genes (24, 25).
Tw o upstream overlapping promoter sequences similar to
the consensus sequences (TTG AC A and TAT AA T) of pro-
karyotes (26) have een indicated in Fig. 3, although the first
-35 sequence shows relatively poor comp lementarity (only
three out of six) with TTGA CA. T he gene is both preceded
an d followed by palindromic sequences indicat ing transcrip-
tion erminations.Theseare ndicated n Fig.
3,
and he
possible m RNA hairpin structures hat can be formed are
schematically drawn in Fig. 4. Both palindrom es are ollowed
by a T-rich stre tch of residues (T TT AT TT T) . Although we
do not have any experimental data to show where th e tr an -
scription of the protein A mRN A starts or terminates, it thus
appears l ikely that protein A is t ranslated from a monocis-
tronic mRNA.
Amino Acid Sequence-The am ino acid sequence deduced
from th e DNA sequence as well as am ino acids th at differ in
the partial prote in sequence established in Sjodahl (27) are
also indica ted in ig. 3. Among t he IgG -binding regions D, A,
B, and C,
a
high degree of homology exi sts and only 4 ou t of
th e 235 amino acids comprising all four regions vary.ll these
changes can e explained by single point mutation s. Since the
DNA sequence was obtained from strain 8325-4 and he
protein sequence rom stra in Cowan I the divergence is
probablydue to train variation. Th epart ia lamino acid
sequence of region X also shows high similarityo the educed
sequence although about 10%of the am ino acids are differ-
ent. ' The amino acid numb ering starts with the alanine at
nucleotide 292 which h as been shown to be th e first amin o
acid of the ma ture pr otein A.' Th e s top codon at nucleotide
1,711 thu s gives a mature protein A of 473 amino acids a nd a
resulting
M ,
= 52,752.
Amino Acid Composition-Attempts to deter min e the pro-
tein sequence of protein A have involved digestion of staph-
ylococcal cell walls with ly sosta phin (28) or analyzing pro tein
A rom mu tant bacter ia which secrete the prod uct (8). In
order to compar e the sequences deduced from the DNA se-
quence with those obtained experimentally, the amin o acid
compositions of differen t parts of the pro tein , as deduced
from th e DNA sequence, are tabulated in Table
.
The amino
acid compositions of purified protein A from differ ent strains
of
S.
aureus are also presented in Tab le I. A direct comparison
of structu res from deduced an d purified pro teins is difficult,
due tostrain differences and proteolyticdigestion during
isolation of the prote in. According to Sjodahl (27) and Lind -
mark et al. (8), there arenly a few amino acids NH Z-terminal
U. Hellman, unpublished results.
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
3/8
D N A
Sequence of Staphylococcal Protein
A
1697
*
11
- 0
U L
U L
am
a z
a J
a -
a s
am
am
a >
u a
a d
e n
o a
m
n a
a 3
am
a >
a 1
am
a u
e
n o
a
m r r
o a c
- u
u
m o v
c u
u c
e d
e m
u -
e u
r
u c
- a
1o
m a
m a
a -
am
a r
a J
u n
am
o a
a t
a c
U U
L e m
u -
r u
e
E
u -
? L
U T
C L
U L
u >
0 0
0 -
u c
am
a a
am
ma
am
a >
a J
u -
y7
u n
am
ma
am
a >
a J
I - >
c u
U J
c L
w u
am
a c
e -
a -
::
L
o-
O W
u >
u c
a a
m
u a
e o
m i
;
l 4
B 2
5
si2
>
am
a a
a >
u c
e u
e J
u c
e o
U L
U L
W Y
ern
u c
O >
c u
e J
u r
8 X P
a s
am
a a
E
S
c L
I-
u u
e L
t L
a z
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
4/8
1698
DNA
Sequence of Staphylococcal Protein A
A.
A-T
6-C
T-A
C C
C-G
T-A
CT-A
T -A
A ~ - ~
T T
f 5
T-A,
5 - . .
.
. TAAGCC ' TTTATTTTAT
..... -3
.
851
C
-T, T
T /T
A-T
C-G
A-T
A-T
C
-G
G-C
A-T
G-CA
C-G
A-T
C -G
G-C
T-A
A-T
A-T
A-r
5 I
-
. . .
ATCATCT/ TTATTTTAC.
.
3
FIG. 4. Hypothetical secondary structures a t the
5
and 3
regions flanking theprotein
A
coding sequence. The
numbers
refer to nucleotides in Fig. 3.
of region D in protein A isolated from cell walls of Cowan I.
However, the exact NHAerminal sequence could not be ob-
tained due to a blocked terminus (27). Table I shows that the
size of the deduced protein from 8325-4 is larger than two
independent determinationsof the protein from Cowan I even
if
region
E
is omitted (A-E). At present,
it
is unclear if this
difference in size and amino acid composition is due to pro-
teolysis both in the NH2-terminal and OOH-terminal part s
of the protein or if it reflects genomic differences. The protein
A
gene of Cowan I has recently been cloned in our laboratory,
which will help to clarify this point.
Incontrast, t appears likely that he secreted form of
protein A from stra in A676 does contain region
E.
The NH2-
terminal sequence of this protein
(8)
fits
well with the NH2-
terminus of protein A from strain 8325-4 when determined
both by Edman degradation of the purified protein' and by
DNA sequence starting at nucleotide 292 in Fig. 3. The size
of protein A from A676 would then indicate that the protein
is truncated at theCOOH-terminal lacking approximately
80
amino acids. The amino acid composition, as deduced from
the DNA sequence, of a mature protein A lacking 107 amino
acids in the COOH-terminal part shows good agreement with
the composition of purified protein A from strain A676 as
shown in Table I . However, the DNA sequence does not
contain the COOH-terminal -Val-Ala-Lys which has been
reported for A676
(8).
Codon Usage-The codon usage for the preprotein of pro-
tein A
is
compared in Table
I1
with other Gram-positive
genes. Chromosomal genes are represented by four Bacillus
TABLE
Am ino acid comp osition of deduced pro tein A gene or purified protein
from different strains of. aureus
Amino acids
Lysine
Histidine
Arginine
Aspartic acid
Threonine
Serine
Glutamic acid
Proline
Glycine
Alanine
Valine
Methionine
Isoleucine
Leucine
Tyrosine
Phenylalanine
Total
Deduced protein A from
Purified protein
A
Prot-A Mat-Ab A-E' A -Xd Cowan
I
Cowan
I'
A67W
69 65 62 45 52 53 48
7 7 6 3 4 4 3
6 5 4 5 5 4 4
105 103 915 82 83 82
10
7 7 2 5 6 4
252
18
207 16 16
78 78 67 68 650 64
31 30 27 24 2767
33 28 268 30 302
42 38 31 31 3461
15 12 10 4 5 8 7
6 6 5 3
2 3 3
18430
9 121
4161 29
2787
9
8 7 5 5 4 4
14
1424
1223
509
4731766 381 39566
8325-4
a
Protein
A
including the signal peptide.
* Mature protein A, amino acids 1-473 in Fig. 3.
dMature protein
A
except COOH-terminal part, amino acids
1-
e
From Movitz (2), solated by lysostaphin treatment of bacteria.
From Lindmark
et al.
(8), solated by lysostaphin treatment
of
8 From Lindmark
et al.
(8), extracellular protein A produced by a
Mature protein A except region
E,
amino acids 57-473.
366.
bacteria.
methicillin-resistant strain.
genes and plasmid-coded genes by the four putative proteins
encoded by the staphylococcal plasmid vector pC194 (26).
Also indicated by
+
or are the codon pairs which, according
to Grosjean and Fiers (33), are most likely to be preferred or
not preferred, respectively, by highly expressed genes. Their
hypothesis predicts that efficient in-phase translation is fa-
cilitated by proper choice of degenerate codewords, and the
codon pairs marked in Table I1 are most dependent on max-
imal codon-anticodon interaction energy.
Table I1 shows that among the chromosomal genes the
codon usage is randomly distributed. The per cent G/C of the
degenerate third base is 42%, similar to the verall GC content
of the Bacillus species involved, which is 42-47% (34). In
contrast, the plasmid-coded genes have a marked preference
for A/U bases, only
22
G/C. Although the repetitive nature
of the protein Agene makes statistical analysis risky, it seems
to exhibit aclear preference for third position A/U bases with
a few exceptions, UUC (Phe), AAC (Asn), and AGC (Ser).
Two of these exceptions can be explained by the Grosjean
and Fiers (32) hypothesis. Furthermore, among the four codon
pairs n which, according to the theory, selection for C is
preferred, this nucleotide is indeed chosen 64% of the time
(67/105). In contrast, he four codon pairs with predicted
selection for U show a reversed ratio, and only 21 C (18/85)
can be found. The GC content
at
the thirdbase of the codons
is 32%, similar to theGC content of chromosomal DNA from
S.
aureus which is 30-33% (34). Therefore, the codon usage
of the proteinA gene shows a preference for A/U bases
adapting to theoverall GC content of the host cell with some
exceptions, mainly following the Grosjean-Fiers (33) rules for
highly expressed genes.
Homology
Plot
Analysis-In order to search for homologous
regions, the DNA sequence and
its
deduced amino acid se-
quence were scanned by a computer program. Every point in
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
5/8
D N A
Sequence of Staphylococcal Protein A
TABLE
1699
Prot-A Chromb Plasmid Prep
-
_
Phe
U U U
uuc
Leu U U A
U U G
cuu
CUC
CUA
CUG
Ile
AUU
AUC
AUA
Met
AUG
Val
G U U
GUC
GUA
GUG
Ser U C U
ucc
UCA
UCG
Pro
CCU
ccc
CCA
CCG
Thr ACU
ACC
ACA
ACG
Ala GCU
GCC
GCA
GCG
2
12
20
5
7
1
6
2
8
9
1
6
5
2
6
2
5
0
3
2
21
0
8
2
5
1
4
0
25
1
11
5
45
20
34
22
31
7
3
31
38
30
12
29
21
21
21
30
20
21
31
22
16
11
11
25
13
16
48
45
29
36
40
38
39
11
35
13
10
4
5
4
27
5
18
12
12
1
14
4
16
1
7
4
10
5
3
1
14
4
15
5
9
1
6
1
Tyr U A U
UAC
Term UAA
UAG
His
CAU
CAC
Gin CAA
CAG
Asn AAU
AAC
Lys AAA
AAG
Asp GAU
GAC
Glu
GAA
GAG
CysGU
UGC
Term UGA
Trp
UGG
Arg CGU
CGC
CGA
CGG
Ser AGU
AGC
Arg AGA
AGG
GlyGU
GGC
GGA
GGG
Sum
8 49
1 33
0
0
6 27
1
8
383
2 35
20
68
451
51 79
18 26
21 81
195
379
1 35
0 2
0 2
0
0 35
3 18
3
5
0 10
0
9
3 19
127
0 11
0 14
18 22
146
1
46
0 20
509 1654
-
-
Per cent G/c
32 42
29
9
-
17
1
16
6
43
12
56
12
22
5
19
10
7
4
9
4
1
3
0
13
3
11
4
11
2
3
3
655
22
-
B
Protein
A
including the signal peptide (preprotein).
The sum of four Bacillus chromosomal genes, B. amyloliquefaciens a-amylase (25), B. subtilis a-amylase (29).
e
Four putative proteins
of
pC194 (32).
As
the
start
codons are yet to be identified, the total open reading frames
The eight codon pairs which aremost likely to be preferred (+) or not preferred (-) by highly expressedgenes
.
subtilis
SpoOF
(30),
andB.
licheniforrnis
penicillinase
(31).
are taken into account.
(331.
e Per cent G/C in the third degenerate base. The codons AUG (Met),UGG (Trp), and AUA (Ile) are omitted.
the homology plots represents an identical residue (1). The
nucleotide triplets and the educed amino acids are compared
in Fig. 5,
A
and 8 espectively. As the sequence is compared
with itself, a line of identity occurs from the left upper corner
to the ight lower corner, and homologous repeats show up as
parallel lines, which disappear when no homology exists. The
plots reveal two structurally distinct regions with internal
homology, flanked by unique sequences without homology in
the
5
and the
3
ends of the struc tura l gene. Thus , the part
of the gene coding for the signal peptide (S)as well as the
promoter region (5) seems to be totally unrelated to the gG-
binding regions ( E , D, A ,
B
and
C )
located in the middle of
the gene. The partof the gene coding for the COOH-terminal
part of region X as well as the
3
flanking sequence seems to
be unrelated to both the repetitious region X and the IgG-
binding regions. Comparisons between the plots show that
the homology lines in Fig. 5A are more broken than those in
Fig.
5B,
which means that many of the nucleotide changes
between the codons in the homologous regions have occurred
in bases giving no amino acid change. These results strongly
support the previously suggested hypothesis (27) of an evo-
lutionary pressure in these regions keeping the amino acid
sequence preserved.
Structure of
IgG binding
Regions-The IgG-binding regions
of protein
A
have been defined by trypsin cleavage of the
mature protein nto functional IgG-binding uni ts D, A, B, and
C (7, 27). Recently, we showed (10) hat strain 8325-4 also
contains a fifth region
E
homologous to the four repetitive
regions earlier identified by protein sequencing. In Fig. 6 the
sequence of the regions are aligned to enable comparisons. In
order to achieve maximal homology, the boundary of these
regions has been moved 15 nucleotides towards the
3
end of
the gene. This choice is of course arbitrary as the
end and
the 3 end of the repetitive region have diverged slightly.
However, although the last ive amino acids of region C
(292-
296)
are changed compared to region
B,
more than half of
the nucleotides (8/15) are homologous, indicating a relation-
ship. The same holds for the other endf the repetitive region
located in the beginning of region E. Although the first three
amino acids are different from region D, five out of nine
nucleotides are identical. The cleavage points for trypsin are
marked with arrows. There exists a nine-nucleotide insertion
in region E giving three amino acid residues
(59-61)
not
homologous to the othe regions. Also shown in Fig. 6 are the
sequences flanking the repetitive regions.
As
already pointed
out in the homology analysis (Fig. 5, A and
B )
these regions
seem to be nonhomologous
to
the IgG-binding regions.
A changed nucleotide compared to region B in Fig.
6
is
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
6/8
1700
A .
DNA
Sequence
of
Staphylococcal ProteinA
B.
5
3'
5
S E
D A
B
FIG.5.
Dot matrix comparisons
of
the protein
A
sequence.
A, the entire nucleotide sequence and the
immediate 5' and 3' flanking sequences are compared with itself. Each
dot
represents the center
of
a three-base
identity, and direct repeats appear as arallel lines across the grid. R the deduced amino acid sequence compared
with itself.
REGI ON C
FIG.
6.
Comparisons
of
the IgG-binding regions and flanking regions.
The sequences of the repetitive
regions have been aligned to achieve maximal homology. The comparison
is
based on region
B',
and a nucleotide
is marked with an
asterisk
and an amino cid is
underl ined
when different from the B' region. T he cleavage points
for trypsin are marked with arrows.
marked with anasterisk, and a changed amino acid is under-
lined. Table I11 summarizes the aminocid changes and Table
IV
the codon changes between the regions.
A
comparison of
th e five regions with respect to mutual relationship reveals a
pronounced homology gradient along the protein molecule,
i.e. the closer the location of two regions, he higher the degree
of homology.
As
already pointedout by Sjodahl (27) , one
interpretation of thisphenomenon s hat he primordial
structural gene coding for the IgG-binding part of protein A
has been subjected to stepwise gene duplications involving
only one region followed by a period in which point mutations
have occurred, thus generating slight ly dissimilar nucleotide
and amino acid sequences. As a result of these evolutionary
events, a homology gradient will evolve. The fac t tha t odons
(Table IV) have changed much faster than aminocids (Table
111) indicates that an volutionary pressure exists tokeep the
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
7/8
D N A
Sequence of Staphylococcal ProteinA
TABLE11
Com paris on of am ino acid s of the ZgG-binding regions
The values listed represent the numberf changed amino acids of
identically positioned residueswhen the regions are compared n
pairs.
Region
E D A B C Total
E
0 11
124
21 57
D
11 0
7 11
176
A
12
0
5 15 40
B
14
11
5 0
10
41
C
21 17
15
10
0
64
TABLEV
Com paris on of codons of
the
ZgG-binding regions
The values listed epresent henumber of changednucleotide
triplets of identically pos itioned codons when the regions are cam-
pared in pairs.
Region E
D
A B C Total
E
0 31
25 266
118
D
31 0
21 258
105
A 25
21
0 1 4 30 101
B 26 25
14 0 20
8 6
C
36 28
30 20 0
115
amino acidsequencepreserved. Sinc e he num ber of total
changes of codons is lowest for region B (Table IV), his
region was chosen for the com paris on in Fig. 6.
Structuralstudies of protein A have suggested th at 11
amino acids of the IgG-b inding egions are essential or bind-
ing to the , part of th e immunoglobulins (35). Mo st f these
amin o acids are assum ed to e located in two a-helical regions
(35). In region
B,
the corre spon ding residues ar e 183-192
and 198-211. As seen in Fig. 6, there are strikin g homologies
in these two a- helices between th e diffe rent regions, suggest-
ing an evolut ionary pressure to keep these residues intact.
The chan ges observed are often out side the two helical areas,
for instance, hechangedHis-Leu,atpo sitio n 193-194 of
region B, to Asn-Met , inegions E, D, and A. This pressure
is evenmore pronounced when comp aring he residues in
these a-hel ice s hat nteract with IgG. In region B, these
amino acids are 184-186 (G ln-Gln-Asn), 188-189 (P he-T yr),
192 (Leu ), 203 (A sn) , 206-207 (Ile-G lu), and 210 (Ly s). As
seen in Fig. 6, there s a serine instea d f aspargin e at position
70, but all the other 49 residues are identical. Clearly, there
is a strong pres sure to eep these ami no acids preserved.
Apa rt from the mu tual homology between t he five regions,
there also seem to exist internalhomologies in each region as
revealed by trace s of lines n Fig.
5,
A a n d
B.
Hence , he
nucleotide sequence coding for am ino acids 179 (L ys) t o 188
(Phe) and 96 (AAC) o 205 (Phe) a l l wi th inegion B contains
24 identical out
of
30 nucleotides. Ano ther subregion of inter-
est is the nine-nucleotide insert, giving the amino acids 59-
61, which has been observed in protein
A
both f rom
S .
a u r e u s
Cowan I an d 8325-4. Th is subregion (residues 57-62) is pos-
sibly related oother regions ike am ino acids 4-9 in the
beginn ing of region
E.
A com parison nucleotide by nucleotide
reveals th at 14 ou t of
18
bases are identical between these
two regions.
Struc ture of egion X-The repetitive nature of region
X
is indicated as mult iple l ines in Fig. 5, A a nd B , giving an
appro ximately 300-base pair repetitive region
(X,)
followed
by a constant region coding for
81
amino acids (Xc) . n Fig.
7, the 24-nucleotide repeats are a l igned an d a mutual com-
parison was performed.Again, a changed nucleotide is mark ed
with an asterisk, a n d a changed aminoacid is underlined. T h e
3
end of the repetitive region is obviously located
at
amino
acid 392 (see Fig. 7) which is directly followed by the con stan t
2 0 9
237
305
313
32
1
329
337
3 4 5
353
361
369
377
385
3)
3
1701
x 1
x 2
x3
x 4
x5
X6
x7
X8
x9
x10
x 1
1
x12
FIG. 7. Comparisonof the repetitive units of region X and
flanking regions.
The sequences of the repetitive region have been
aligned to achieve maximal homology. Th e comparison is based on
region XI, and an altered nucleotide is marked with a n asterisk and
an altered amino acid is
underlined.
The cleavage point for trypsin
which defines region X (7, 20) is immediately before amino acid 292
Glu).The numbers refer to the amino acids in Fig. 3.
region. Since region C erm ina tes at am ino acid 296, the
repe titive part of region
X
consists of exactly 12 units each
with a length
of
24 nucleotides. The bou nda ry etween region
C an d region
X
is, however, not clearly defined sinc e the 12
last nucleotides, coding for he last four am ino acids of region
C, are identical with th e corresponding am ino acids of region
X1
(Fig. 7).
Stru ctura l stud ies based o n the cleavage with trypsin (7,
20) have suggested that region X start s a t am ino acid 292
which differs five amino acids from the bou nda ry chosen in
Fig. 7. As discussed above, the end of region C is probably
related to the other gG-binding regions, but this region has
obviously diverged in the C OO H-term inal end, generat ing a
few am ino acids identical with region X I. Th erefo re, struc-
turally the o ctapep tide f region
X
seems tobe repeated 12.5
times.
Acomparison of the 12 repeated units reveals striking
homologies. The six first amino acids (Lys-Pro-Gly-Lys-Glu-
-
7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702
8/8
1702 D N A Sequence of Staphylococcal Protein
A
Asp) are dentical hrougho ut he X, region. Th e two last
amino acids are changed in a regular pat tern between Asn-
Asn, Gly-Asn, or Asn-Lys. Although th e biological function
of this extremely conserved octapeptide is no t known, clearly
there has been a strong pressure to preserve i ts amino acid
sequence. Hence , 12 nucleotides have changed when com par-
ing the ix conserved amino acids in the12
X,
compartments,
a l l occurring in a wobble posi t ion and therefore representing
silent mutations.
Apart from the dist inct24-nucleotide repeat, there arealso
signs of
a
48-nucleotide rep eat. Th us, the obble base A/G in
th e codon coding for th e first lysine is changed periodically
in regions X7 to X12, and amino cid 7 is changed periodically
between Asn an d Gly in regions 5 to 10 (see ig. 6).
The re also seem s to be omeevidence or
a
homology
gradient throughout the Xregion, a l though the gradient must
be based on a 48-nucleotide repea t rathe r than the prim ordia l
24-nucleotide sequence.
In conclusion, th e evolution of the repe titive part f region
X probab ly involved stepwise gene d uplications of an ances -
tral 24- or 48-nucleotide long sequence. How this evolved at
th e molecular level is unclear, bu t th e nucleotide sequence of
the protein
A
gene from other stra ins, aswell as genes coding
for prote ins with sim ilar repeated structures, may help in
resolving th e molecular events causing tepwise multiple DNA
duplications.
Acknowledgments-We are grateful 50 Dr. Jo hn Sjoqu ist for critical
comments an d advice. We thank Hans-Olof Pette rsson and Bjorn
Jansson for skillful technical assistance and ChristinaPellettieri and
Gerd Benson for patient secretarial help. We also thank Dr. Andras
Gaal for introducing us to the thermostatic LKBMacrophor system
and Dr. S tephe n Fahnestock for a correction of th e nucleotide se-
quence.
REF ERENCES
1. Jeffreys, A.
J.
(1981) in Genetic Engineering (Williamson, R., ed)
2. Fishetti,
V.
A., and Manjula, B. N. (1982) Semin. Infect.
Dis.
4,
3. Hirano, H., Yamada, Y., Sullivan, M., de Crombrugghe, B., Pas-
tan, I., andY am ada , K. M. (1983) Proc. Natl.
Acad.
Sci.
U. s.
A.
Vol. 2, pp. 1-48, Academ ic Press, New York
411-418
80,46-50
4. Ohno,
S.
(1981)
Proc.
Natl. Acad. Sci.
U S.
A.
78,
7657-7661
5. Hartley,
I. L.,
and Gregori,
T. J.
(1981) Gene (Amst.)
13,
347-
353
6. Tanaka,
T.
1979)
J .
Bacteriol. 139,775-782
7. Sjodahl, J. (1977) Eur. J . Biochem. 73, 343-351
8.
Lindmark, R., Movitz,
I.,
and Sjoquist,
J.
(1977)
Eur. J .
Biochem.
74,623-628
9. Beachey, E. H., Seyer, I. M., and Kang, A .H. (1982) Semin.
10. Lofdahl,
S.,
Guss , B., U h lh ,
M.,
Philipson, L., and Lindberg, M.
11. Langone, J. J. 1982) Adu. Zmmunol. 32,157 -252
12. Boyer, H.
W.,
and Roulland-Dussoix, D. (1969)
J.
Mol. Biol. 4 1 ,
13. M arinus, M. G. (1973) Mol.
Gen.
G en et. 1 2 7 , 4 7 4 5
14. Bolivar, F., Rodriquez, R. L., Greene,
P.
J.,
Betlach, M. C.,
Heyneker, H. L., Boyer, H. W., Crosa, J. H., and Falkow,
S.
(1977)
Gene
(Amst.) 2,95-113
15. Roberts, T. M., Swanberg, S. L., Poteete, A., Riedel, G., and
Bachman, K. (1980) Gene (Amst.) 12, 123-127
16. D ente, L., Cesaren i, Y., an d Cortese, R. (1983) Nucleic Acids Res.
17. Birnboim, H. C., and Doly, J. (1979) Nucleic Acids Res. 7, 1513-
18. Morrison, D. A. (1979) Methods Enzymol. 68,326-3 31
19. Maxam, A.M., an d Gilber t, W. (1977) Proc. Natl. Acad. Sci.
20. Sanger, F., Nicklen,
S.,
and Coulson, A. R. (1977) Proc. Natl.
21.UhlBn,M., Nilsson, B., Guss, B., Lindberg, M., Gaten beck,
S.,
22. Kozak,
M.
1983) M icrobiol.
Reu.
47.
1-45
23. Shine, J., and D algarno, L. (1975) Nature Lord.) 54, 34-38
24. McLaughlin, J. R., Murray, C. L., and Rabinowitz, C. (1981)
J .
Biol. Chem. 256,11283-11291
25. Takkinen, K., Pettersson, R. F., Kalkkinen, N., Palva, I., Soder-
lund, H., and Kaariiiinen, L. (1983)
J.
Biol. Chem. 258 , 1007-
1013
26. Johnson, W. C., Moran, C. P., and Losick,
R.
(1983) Nature
(Lond.) 302,80 0-804
27. Sjodahl, J. (1977)
Eur. J .
Biochem. 78, 471-490
28. Movitz,
J.
(1976)
Eur. J .
Biochm.
68,
291-299
29. Y ang, M., Galizzi, A,, and Hen ner , D. (198 3) Nucleic Acids
Res.
30. Shimotsu, H., Kawamura,
F.,
Kobayashi, Y., and Saito, H. (1983)
31. Neugebauer, K., Sprengel, R., and Schaller, H. (1981) Nucleic
32. Horinouchi, S., and W eisblum, B. (1982) J.Bacteriol.
150,
815-
33. Grosjean,
H.,
nd Fiers,
W.
(1982)
Gene
(Amst.)
18,
199-209
34. Fasm an, G. D. (ed) (1976) CRC Handbook of Biochemistry and
Molecular Biology: Nucleic Acids Section 3rd Ed., Vol.
11
pp.
69-183, CRC Press, Inc., Boca Raton, FL
35. Deisenhofer,
J.
(1981) Biochemistry
20,
2361-2370
Infect. Dis. 4,401-410
(1983) Proc. Natl. Acad. Sci. U.
S.
A.
80
697-701
459-472
11,1645-1655
1523
U S . A. 74,560-564
Acad. Sci. U. S. A. 74,5463-5467
and Philipson, L. (1983)
Gene
(Amst.) 23,369-37 8
11.237-249
Proc. Natl. Acad. Sci.
U
S. A.
80
658-662
Acids Res. 9 2577-2588
825