J. Biol. Chem.-1984-Uhlén-1695-702

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

1/8

THE

OURNALF BIOLOGICAL CHEMISTRY

0 1984by The American Society of BiologicalChemists, Inc

Vol.

259,No.

3,

ssue

of February

pp. 1695-1702,1984

Printed in U.S.A.

Complete Sequence of the Staphylococcal Gene Encoding rotein A

A GENE EVOLVED THROUGHMULTIPLE DUPLICATIONS*

(Received for publication, August 4, 1983)

Mathias Uhlen QlI, Bengt GussQ, Bjorn NilssonSTi,ten Gatenbeck , Lennart hilipsonQII and

Martin Lindberg **

From the Department

of

Biochemistry, Royal Institute of Technology, S 100

44

Stockholm, Swedenand the Department of

Microbiology, University of Uppsala,

The

Biomedical Center, Box

581,

S 751 3 Uppsala, Sweden

The gene coding for proteinA from

Staphylococcus

aureus

has been isolated by molecular cloning, and a

subclone containing an 1.8-kilobase insert was found

to give a functional proteinA in

Escherichia coli.

The

complete nucleotide sequence of the nsert, including

thestructuralgeneand he 5 and 3 flanking se-

quences, has been determined. Starting from a TTG

initiatorcodon,anopenreading ramecomprising

1527 nucleotidesgives a preprotein of 09 amino acids

and a predicted M

=

58,703. The structural gene is

flanked on both sides by palindromic structures fol-

lowed bya stretch ofT residues, suggesting transcrip-

tional termination signals. Thus, it appears that pro-

tein A is translated froma monocistronic mRNA.

The sequence reveals extensive internal homologies

involving a 58-amino acid unit, responsible for IgG

binding, repeated 5 times and an 8-amino acid unit,

possibly responsible for binding o the cell wall of

S.

aureus

repeated 12 times. Comparisons between the

repeated regions show a marked preference forsilent

mutations, indicating an evolutionary pressure to keep

the amino acid sequence preserved. The structure of

the gene also uggests how the gene has volved.

Evolution by gene duplication is a well known phenomenon

among eukaryotic genes. The globin clusters, the immuno-

globulins, and the nterferon genes probably all have ancestral

genes which have been duplicated and hen diverged into

functionally distinct genes (1). Examples of internally, repet-

itive sequences have also been reported; rabbit skeletal tro-

pomysin contains a 7-residue amino acid periodicity through-

out the molecule (2), andsimilar repeats have been reported

for chicken fibronectin (3 ) and mammalian serum albumin

(4). Among prokaryotes, most reports of duplicated genes

have involved

in

vitro

constructions (5), which seem to be

stable in

Escherichia

coli, but dramatically unstable n

Bacillus

subtilis

(6).

However, the amino acid sequences of a few cell

wall-bound proteins from Gram-positive bacteria have re-

vealed remarkable periodicity, i.e. staphylococcal protein A

(7,8) andstreptococcal M protein (9).

We have earlier reported on the molecular cloning of the

* The costs of publication of this article were defrayed in part by

the payment of page charges. This article must therefore be hereby

marked advertisement in accordance with

18

U.S.C. Section 1734

solely

to

indicate this fact.

T Supported by grants from the Swedish National Board for Tech.

nical Development.

Present address, European Molecular Biology Laboratory, Hei.

delberg, Federal Republic of Germany.

** Supported by grants from the Swedish Medical Research Coun-

cil and Pharmacia Fine Chemicals, Uppsala.

gene for staphylococcal protein A in

E .

coli (10). This protein

interacts with the F, (constantpart of immunoglobulins)

domain of several immunoglobulins from many species in-

cluding man an d has herefore been used extensively for

quantitative and qualitative immunological techniques (11).

Amino acid sequence analysis of proteinA revealed two

functionally distinct regions of the molecule

(7,

8). Both

regions have remarkably repetitive structures.

The NH2-terminal part contains four or five homologous

IgG-binding units consisting of approximately 58 amino acids

each. The COOH-terminal par t which is thought to bind to

the cellwall of Staphylococcus

aureus

consists of several

repeats of an octapeptide (Glu-Asp-Gly-Asn-Lys-Pro-Gly-

LYS) 8).

In a previous report

( l o ) ,

we determined the nucleotide

sequence of the promoter region, as well as the egion coding

for

the NH2-terminal part of the protein. Here we report the

complete nucleotide sequence of the protein A gene including

the 5 and 3 flanking regions from the

S.

aureus strain 8325-

4. Thestructural gene is 1,527 nucleotides long giving a

preprotein consisting of 509 amino acids and a

M ,

= 58,703.

The repetitive structure of the gene has been clarified which

suggests how the gene has evolved.

EXPERIMENTALPROCEDURES

Bacterial Stra ins and Plasmids-E. coli strains HBlOl (12) and

pBR322 (14), TR262 (15), and pEMBL9 (16).

GM161 (13) were used as bacterial hosts. The plasmid vectors were

DNA Preparations-Plasmid DNA was prepared by the alkaline

extraction method (17). Transformation of E. coliwasmade as

described by Morrison 18). Restriction endonucleases, T4 DNA

ligase (New England Biolabs), alkaline phosphatase, and T4polynu-

cleotide kinase (Boehringer-Mannheim) were used according to the

suppliers recommendations.

Isolation of the 2.15-kilobase DNAfragment containing the entire

protein A gene was made by digesting the plasmid pSPA3

(10)

with

EcoRV. The digested material was electrophoresed on a 5% polyac-

rylamide gel, and the 2.15-kilobase fragment was eluted electropho-

retically. The isolated fragment was passed over an anion exchange

column, eluted, and precipitated with ethanol. The precipitated ma-

terial was washed in 80% ethanol, dried, resuspended in water, and

used for DNA sequence analyses.

DNA Sequencing Determinutions-DNA fragments were se-

quenced by the method of Maxam and Gilbert (19) or Sanger et al.

(20). The samples were analyzed on 6, 8, and 20% denaturing poly-

acrylamide gels using the thermostatic LKB Macrophor system.

Computer Anulysis-All the sequencing analyses were performed

on a Hewlett-Packard desktop computer (HP-85) equipped with a

HP7225A plotter. The software was constructed by M. Uhlen.

RESULTSANDDiSCUSSION

D N A Sequence-We have earlier reported that theprotein

A gene from S aureus strain 8325-4 is located ona 1.8-

kilobase insert of staphylococcal DNA cloned in the plasmid

1695

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

2/8

1696

D N A Sequence of Staphylococcal Protein

A

pSPA8

\ \

e t I

fl

FIG. 1. Structure of plasmid pSPA8 with relevant restric-

tion sites. The protein A gene is contained in a

1.8

kilobase

TuqI-

EcoRV insert in the plasmid pBR322.

Boxes

show the positions of

the replication origin O R 0 and the enes coding forprotein A

PROT

A )

and p-lactamase

( A M P ) .

B=

0

kb

C.

ToqI

E c o R V

Bcl

P s t I

HlndI

I

S o u 3

Rea1

EcoRI

FIG.

2.

Restriction map

and

sequencing strategy

of

the in-

sert. A schematic drawing of the gene coding for protein A with its

different regions.

S

is a signal sequence,

- D

are IgG-binding regions,

E is a region homologous to A-D and X is the COOH-terminal part

of protein A which lacks IgG-binding activity. B , partial restriction

map of the corresponding DNA sequence.

C,

sequencing strategy of

the 1.8-kilobase insert.

pBR322 (21) .Theplasm id was designatedpSPA8and s

shown schematically n Fig. 1. Expression of the gene was

demonstrated in E . coli. Th e sequence of the prom oter region

and the 5 ' end of the structural gene has been reported (10)

as well

as

th e sequence of the epetitiv e region

X

which

probably

is

responsible for the ell wall binding of the pr ote in

in S . aureus.

Using the stra tegy outl inedn Fig. 2C, the entire insert as

sequenced according to the meth od of M axam and Gilbert

(19). It was not possible to obtain sequence on both strand s

in all parts of th e gene, and the refor e additio nal sequencing

using the enzym atic m ethod (20, 16) was performed in order

to confirm th e sequence in these parts. As no palindrom ic

sequence indicat ing transcript ion termination was found in

the 3' end f th e gene, th e sequence

a

few hund red nucleotides

downstream from the EcoRV si teon he originalplasmid

pS P Al (10 )was determined using both method s (19,20). The

complete nucleotide sequence of the prote in A gene is shown

in Fig. 3 . No te hat he previouslypublishedsequence of

Lofdahl et al. (10) lacks one of the three thym idines atosition

183-185.

Guss, B., Uhlh,

M.,

Nilsson,

B.,

Lindberg,

M.,

Sjoquist,

J.,

and

Sjodahl, J. (1984)Eur.

J.

Biochem., in press.

Start ing from

a

TT G c odon

at

nucleotide 184, the re i s an

open reading framef 1,527 nucleotides term inating in TAG

stop codon at nucleotide 1,711. The prep rotein , ncluding the

putative sign al peptide, con sists of 509 amino acids giving

a

M , = 58,703. Although we have not shown th at th e codon at

nucleotide 184 is the ran slati on al star t, her e are several

reasons to postulate this . First , TTG iscommon s tart codon

in G ram-posit ive bacteria (21), unlike E. coli in w hich it is

very rare (22).Second, th is sta rt odon gives a putative signal

peptide with a reasonable size (36 am ino acids) and structu re

(a few basic residues followed by a stre tch of 23 hydrophobic

residues). Third, this codon

is

preceded by a possible Shin e-

Dalgarno sequence (23) that has man y features in common

with other Gram -positive ribosomal binding sequences (24).

8

out of 11 nucleotides are complementary to the

'

end of B.

subtilis

16 S rRNA, similar to other Gram -posit iveenes (25).

In addit ion, the pace between the lastG in this equence and

the start codon is sevennucleotides, also sim ilar o other

Gram -positive genes (24, 25).

Tw o upstream overlapping promoter sequences similar to

the consensus sequences (TTG AC A and TAT AA T) of pro-

karyotes (26) have een indicated in Fig. 3, although the first

-35 sequence shows relatively poor comp lementarity (only

three out of six) with TTGA CA. T he gene is both preceded

an d followed by palindromic sequences indicat ing transcrip-

tion erminations.Theseare ndicated n Fig.

3,

and he

possible m RNA hairpin structures hat can be formed are

schematically drawn in Fig. 4. Both palindrom es are ollowed

by a T-rich stre tch of residues (T TT AT TT T) . Although we

do not have any experimental data to show where th e tr an -

scription of the protein A mRN A starts or terminates, it thus

appears l ikely that protein A is t ranslated from a monocis-

tronic mRNA.

Amino Acid Sequence-The am ino acid sequence deduced

from th e DNA sequence as well as am ino acids th at differ in

the partial prote in sequence established in Sjodahl (27) are

also indica ted in ig. 3. Among t he IgG -binding regions D, A,

B, and C,

a

high degree of homology exi sts and only 4 ou t of

th e 235 amino acids comprising all four regions vary.ll these

changes can e explained by single point mutation s. Since the

DNA sequence was obtained from strain 8325-4 and he

protein sequence rom stra in Cowan I the divergence is

probablydue to train variation. Th epart ia lamino acid

sequence of region X also shows high similarityo the educed

sequence although about 10%of the am ino acids are differ-

ent. ' The amino acid numb ering starts with the alanine at

nucleotide 292 which h as been shown to be th e first amin o

acid of the ma ture pr otein A.' Th e s top codon at nucleotide

1,711 thu s gives a mature protein A of 473 amino acids a nd a

resulting

M ,

= 52,752.

Amino Acid Composition-Attempts to deter min e the pro-

tein sequence of protein A have involved digestion of staph-

ylococcal cell walls with ly sosta phin (28) or analyzing pro tein

A rom mu tant bacter ia which secrete the prod uct (8). In

order to compar e the sequences deduced from the DNA se-

quence with those obtained experimentally, the amin o acid

compositions of differen t parts of the pro tein , as deduced

from th e DNA sequence, are tabulated in Table

.

The amino

acid compositions of purified protein A from differ ent strains

of

S.

aureus are also presented in Tab le I. A direct comparison

of structu res from deduced an d purified pro teins is difficult,

due tostrain differences and proteolyticdigestion during

isolation of the prote in. According to Sjodahl (27) and Lind -

mark et al. (8), there arenly a few amino acids NH Z-terminal

U. Hellman, unpublished results.

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

3/8

D N A

Sequence of Staphylococcal Protein

A

1697

*

11

- 0

U L

U L

am

a z

a J

a -

a s

am

am

a >

u a

a d

e n

o a

m

n a

a 3

am

a >

a 1

am

a u

e

n o

a

m r r

o a c

- u

u

m o v

c u

u c

e d

e m

u -

e u

r

u c

- a

1o

m a

m a

a -

am

a r

a J

u n

am

o a

a t

a c

U U

L e m

u -

r u

e

E

u -

? L

U T

C L

U L

u >

0 0

0 -

u c

am

a a

am

ma

am

a >

a J

u -

y7

u n

am

ma

am

a >

a J

I - >

c u

U J

c L

w u

am

a c

e -

a -

::

L

o-

O W

u >

u c

a a

m

u a

e o

m i

;

l 4

B 2

5

si2

>

am

a a

a >

u c

e u

e J

u c

e o

U L

U L

W Y

ern

u c

O >

c u

e J

u r

8 X P

a s

am

a a

E

S

c L

I-

u u

e L

t L

a z

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

4/8

1698

DNA

Sequence of Staphylococcal Protein A

A.

A-T

6-C

T-A

C C

C-G

T-A

CT-A

T -A

A ~ - ~

T T

f 5

T-A,

5 - . .

.

. TAAGCC ' TTTATTTTAT

..... -3

.

851

C

-T, T

T /T

A-T

C-G

A-T

A-T

C

-G

G-C

A-T

G-CA

C-G

A-T

C -G

G-C

T-A

A-T

A-T

A-r

5 I

-

. . .

ATCATCT/ TTATTTTAC.

.

3

FIG. 4. Hypothetical secondary structures a t the

5

and 3

regions flanking theprotein

A

coding sequence. The

numbers

refer to nucleotides in Fig. 3.

of region D in protein A isolated from cell walls of Cowan I.

However, the exact NHAerminal sequence could not be ob-

tained due to a blocked terminus (27). Table I shows that the

size of the deduced protein from 8325-4 is larger than two

independent determinationsof the protein from Cowan I even

if

region

E

is omitted (A-E). At present,

it

is unclear if this

difference in size and amino acid composition is due to pro-

teolysis both in the NH2-terminal and OOH-terminal part s

of the protein or if it reflects genomic differences. The protein

A

gene of Cowan I has recently been cloned in our laboratory,

which will help to clarify this point.

Incontrast, t appears likely that he secreted form of

protein A from stra in A676 does contain region

E.

The NH2-

terminal sequence of this protein

(8)

fits

well with the NH2-

terminus of protein A from strain 8325-4 when determined

both by Edman degradation of the purified protein' and by

DNA sequence starting at nucleotide 292 in Fig. 3. The size

of protein A from A676 would then indicate that the protein

is truncated at theCOOH-terminal lacking approximately

80

amino acids. The amino acid composition, as deduced from

the DNA sequence, of a mature protein A lacking 107 amino

acids in the COOH-terminal part shows good agreement with

the composition of purified protein A from strain A676 as

shown in Table I . However, the DNA sequence does not

contain the COOH-terminal -Val-Ala-Lys which has been

reported for A676

(8).

Codon Usage-The codon usage for the preprotein of pro-

tein A

is

compared in Table

I1

with other Gram-positive

genes. Chromosomal genes are represented by four Bacillus

TABLE

Am ino acid comp osition of deduced pro tein A gene or purified protein

from different strains of. aureus

Amino acids

Lysine

Histidine

Arginine

Aspartic acid

Threonine

Serine

Glutamic acid

Proline

Glycine

Alanine

Valine

Methionine

Isoleucine

Leucine

Tyrosine

Phenylalanine

Total

Deduced protein A from

Purified protein

A

Prot-A Mat-Ab A-E' A -Xd Cowan

I

Cowan

I'

A67W

69 65 62 45 52 53 48

7 7 6 3 4 4 3

6 5 4 5 5 4 4

105 103 915 82 83 82

10

7 7 2 5 6 4

252

18

207 16 16

78 78 67 68 650 64

31 30 27 24 2767

33 28 268 30 302

42 38 31 31 3461

15 12 10 4 5 8 7

6 6 5 3

2 3 3

18430

9 121

4161 29

2787

9

8 7 5 5 4 4

14

1424

1223

509

4731766 381 39566

8325-4

a

Protein

A

including the signal peptide.

* Mature protein A, amino acids 1-473 in Fig. 3.

dMature protein

A

except COOH-terminal part, amino acids

1-

e

From Movitz (2), solated by lysostaphin treatment of bacteria.

From Lindmark

et al.

(8), solated by lysostaphin treatment

of

8 From Lindmark

et al.

(8), extracellular protein A produced by a

Mature protein A except region

E,

amino acids 57-473.

366.

bacteria.

methicillin-resistant strain.

genes and plasmid-coded genes by the four putative proteins

encoded by the staphylococcal plasmid vector pC194 (26).

Also indicated by

+

or are the codon pairs which, according

to Grosjean and Fiers (33), are most likely to be preferred or

not preferred, respectively, by highly expressed genes. Their

hypothesis predicts that efficient in-phase translation is fa-

cilitated by proper choice of degenerate codewords, and the

codon pairs marked in Table I1 are most dependent on max-

imal codon-anticodon interaction energy.

Table I1 shows that among the chromosomal genes the

codon usage is randomly distributed. The per cent G/C of the

degenerate third base is 42%, similar to the verall GC content

of the Bacillus species involved, which is 42-47% (34). In

contrast, the plasmid-coded genes have a marked preference

for A/U bases, only

22

G/C. Although the repetitive nature

of the protein Agene makes statistical analysis risky, it seems

to exhibit aclear preference for third position A/U bases with

a few exceptions, UUC (Phe), AAC (Asn), and AGC (Ser).

Two of these exceptions can be explained by the Grosjean

and Fiers (32) hypothesis. Furthermore, among the four codon

pairs n which, according to the theory, selection for C is

preferred, this nucleotide is indeed chosen 64% of the time

(67/105). In contrast, he four codon pairs with predicted

selection for U show a reversed ratio, and only 21 C (18/85)

can be found. The GC content

at

the thirdbase of the codons

is 32%, similar to theGC content of chromosomal DNA from

S.

aureus which is 30-33% (34). Therefore, the codon usage

of the proteinA gene shows a preference for A/U bases

adapting to theoverall GC content of the host cell with some

exceptions, mainly following the Grosjean-Fiers (33) rules for

highly expressed genes.

Homology

Plot

Analysis-In order to search for homologous

regions, the DNA sequence and

its

deduced amino acid se-

quence were scanned by a computer program. Every point in

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

5/8

D N A

Sequence of Staphylococcal Protein A

TABLE

1699

Prot-A Chromb Plasmid Prep

-

_

Phe

U U U

uuc

Leu U U A

U U G

cuu

CUC

CUA

CUG

Ile

AUU

AUC

AUA

Met

AUG

Val

G U U

GUC

GUA

GUG

Ser U C U

ucc

UCA

UCG

Pro

CCU

ccc

CCA

CCG

Thr ACU

ACC

ACA

ACG

Ala GCU

GCC

GCA

GCG

2

12

20

5

7

1

6

2

8

9

1

6

5

2

6

2

5

0

3

2

21

0

8

2

5

1

4

0

25

1

11

5

45

20

34

22

31

7

3

31

38

30

12

29

21

21

21

30

20

21

31

22

16

11

11

25

13

16

48

45

29

36

40

38

39

11

35

13

10

4

5

4

27

5

18

12

12

1

14

4

16

1

7

4

10

5

3

1

14

4

15

5

9

1

6

1

Tyr U A U

UAC

Term UAA

UAG

His

CAU

CAC

Gin CAA

CAG

Asn AAU

AAC

Lys AAA

AAG

Asp GAU

GAC

Glu

GAA

GAG

CysGU

UGC

Term UGA

Trp

UGG

Arg CGU

CGC

CGA

CGG

Ser AGU

AGC

Arg AGA

AGG

GlyGU

GGC

GGA

GGG

Sum

8 49

1 33

0

0

6 27

1

8

383

2 35

20

68

451

51 79

18 26

21 81

195

379

1 35

0 2

0 2

0

0 35

3 18

3

5

0 10

0

9

3 19

127

0 11

0 14

18 22

146

1

46

0 20

509 1654

-

-

Per cent G/c

32 42

29

9

-

17

1

16

6

43

12

56

12

22

5

19

10

7

4

9

4

1

3

0

13

3

11

4

11

2

3

3

655

22

-

B

Protein

A

including the signal peptide (preprotein).

The sum of four Bacillus chromosomal genes, B. amyloliquefaciens a-amylase (25), B. subtilis a-amylase (29).

e

Four putative proteins

of

pC194 (32).

As

the

start

codons are yet to be identified, the total open reading frames

The eight codon pairs which aremost likely to be preferred (+) or not preferred (-) by highly expressedgenes

.

subtilis

SpoOF

(30),

andB.

licheniforrnis

penicillinase

(31).

are taken into account.

(331.

e Per cent G/C in the third degenerate base. The codons AUG (Met),UGG (Trp), and AUA (Ile) are omitted.

the homology plots represents an identical residue (1). The

nucleotide triplets and the educed amino acids are compared

in Fig. 5,

A

and 8 espectively. As the sequence is compared

with itself, a line of identity occurs from the left upper corner

to the ight lower corner, and homologous repeats show up as

parallel lines, which disappear when no homology exists. The

plots reveal two structurally distinct regions with internal

homology, flanked by unique sequences without homology in

the

5

and the

3

ends of the struc tura l gene. Thus , the part

of the gene coding for the signal peptide (S)as well as the

promoter region (5) seems to be totally unrelated to the gG-

binding regions ( E , D, A ,

B

and

C )

located in the middle of

the gene. The partof the gene coding for the COOH-terminal

part of region X as well as the

3

flanking sequence seems to

be unrelated to both the repetitious region X and the IgG-

binding regions. Comparisons between the plots show that

the homology lines in Fig. 5A are more broken than those in

Fig.

5B,

which means that many of the nucleotide changes

between the codons in the homologous regions have occurred

in bases giving no amino acid change. These results strongly

support the previously suggested hypothesis (27) of an evo-

lutionary pressure in these regions keeping the amino acid

sequence preserved.

Structure of

IgG binding

Regions-The IgG-binding regions

of protein

A

have been defined by trypsin cleavage of the

mature protein nto functional IgG-binding uni ts D, A, B, and

C (7, 27). Recently, we showed (10) hat strain 8325-4 also

contains a fifth region

E

homologous to the four repetitive

regions earlier identified by protein sequencing. In Fig. 6 the

sequence of the regions are aligned to enable comparisons. In

order to achieve maximal homology, the boundary of these

regions has been moved 15 nucleotides towards the

3

end of

the gene. This choice is of course arbitrary as the

end and

the 3 end of the repetitive region have diverged slightly.

However, although the last ive amino acids of region C

(292-

296)

are changed compared to region

B,

more than half of

the nucleotides (8/15) are homologous, indicating a relation-

ship. The same holds for the other endf the repetitive region

located in the beginning of region E. Although the first three

amino acids are different from region D, five out of nine

nucleotides are identical. The cleavage points for trypsin are

marked with arrows. There exists a nine-nucleotide insertion

in region E giving three amino acid residues

(59-61)

not

homologous to the othe regions. Also shown in Fig. 6 are the

sequences flanking the repetitive regions.

As

already pointed

out in the homology analysis (Fig. 5, A and

B )

these regions

seem to be nonhomologous

to

the IgG-binding regions.

A changed nucleotide compared to region B in Fig.

6

is

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

6/8

1700

A .

DNA

Sequence

of

Staphylococcal ProteinA

B.

5

3'

5

S E

D A

B

FIG.5.

Dot matrix comparisons

of

the protein

A

sequence.

A, the entire nucleotide sequence and the

immediate 5' and 3' flanking sequences are compared with itself. Each

dot

represents the center

of

a three-base

identity, and direct repeats appear as arallel lines across the grid. R the deduced amino acid sequence compared

with itself.

REGI ON C

FIG.

6.

Comparisons

of

the IgG-binding regions and flanking regions.

The sequences of the repetitive

regions have been aligned to achieve maximal homology. The comparison

is

based on region

B',

and a nucleotide

is marked with an

asterisk

and an amino cid is

underl ined

when different from the B' region. T he cleavage points

for trypsin are marked with arrows.

marked with anasterisk, and a changed amino acid is under-

lined. Table I11 summarizes the aminocid changes and Table

IV

the codon changes between the regions.

A

comparison of

th e five regions with respect to mutual relationship reveals a

pronounced homology gradient along the protein molecule,

i.e. the closer the location of two regions, he higher the degree

of homology.

As

already pointedout by Sjodahl (27) , one

interpretation of thisphenomenon s hat he primordial

structural gene coding for the IgG-binding part of protein A

has been subjected to stepwise gene duplications involving

only one region followed by a period in which point mutations

have occurred, thus generating slight ly dissimilar nucleotide

and amino acid sequences. As a result of these evolutionary

events, a homology gradient will evolve. The fac t tha t odons

(Table IV) have changed much faster than aminocids (Table

111) indicates that an volutionary pressure exists tokeep the

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

7/8

D N A

Sequence of Staphylococcal ProteinA

TABLE11

Com paris on of am ino acid s of the ZgG-binding regions

The values listed represent the numberf changed amino acids of

identically positioned residueswhen the regions are compared n

pairs.

Region

E D A B C Total

E

0 11

124

21 57

D

11 0

7 11

176

A

12

0

5 15 40

B

14

11

5 0

10

41

C

21 17

15

10

0

64

TABLEV

Com paris on of codons of

the

ZgG-binding regions

The values listed epresent henumber of changednucleotide

triplets of identically pos itioned codons when the regions are cam-

pared in pairs.

Region E

D

A B C Total

E

0 31

25 266

118

D

31 0

21 258

105

A 25

21

0 1 4 30 101

B 26 25

14 0 20

8 6

C

36 28

30 20 0

115

amino acidsequencepreserved. Sinc e he num ber of total

changes of codons is lowest for region B (Table IV), his

region was chosen for the com paris on in Fig. 6.

Structuralstudies of protein A have suggested th at 11

amino acids of the IgG-b inding egions are essential or bind-

ing to the , part of th e immunoglobulins (35). Mo st f these

amin o acids are assum ed to e located in two a-helical regions

(35). In region

B,

the corre spon ding residues ar e 183-192

and 198-211. As seen in Fig. 6, there are strikin g homologies

in these two a- helices between th e diffe rent regions, suggest-

ing an evolut ionary pressure to keep these residues intact.

The chan ges observed are often out side the two helical areas,

for instance, hechangedHis-Leu,atpo sitio n 193-194 of

region B, to Asn-Met , inegions E, D, and A. This pressure

is evenmore pronounced when comp aring he residues in

these a-hel ice s hat nteract with IgG. In region B, these

amino acids are 184-186 (G ln-Gln-Asn), 188-189 (P he-T yr),

192 (Leu ), 203 (A sn) , 206-207 (Ile-G lu), and 210 (Ly s). As

seen in Fig. 6, there s a serine instea d f aspargin e at position

70, but all the other 49 residues are identical. Clearly, there

is a strong pres sure to eep these ami no acids preserved.

Apa rt from the mu tual homology between t he five regions,

there also seem to exist internalhomologies in each region as

revealed by trace s of lines n Fig.

5,

A a n d

B.

Hence , he

nucleotide sequence coding for am ino acids 179 (L ys) t o 188

(Phe) and 96 (AAC) o 205 (Phe) a l l wi th inegion B contains

24 identical out

of

30 nucleotides. Ano ther subregion of inter-

est is the nine-nucleotide insert, giving the amino acids 59-

61, which has been observed in protein

A

both f rom

S .

a u r e u s

Cowan I an d 8325-4. Th is subregion (residues 57-62) is pos-

sibly related oother regions ike am ino acids 4-9 in the

beginn ing of region

E.

A com parison nucleotide by nucleotide

reveals th at 14 ou t of

18

bases are identical between these

two regions.

Struc ture of egion X-The repetitive nature of region

X

is indicated as mult iple l ines in Fig. 5, A a nd B , giving an

appro ximately 300-base pair repetitive region

(X,)

followed

by a constant region coding for

81

amino acids (Xc) . n Fig.

7, the 24-nucleotide repeats are a l igned an d a mutual com-

parison was performed.Again, a changed nucleotide is mark ed

with an asterisk, a n d a changed aminoacid is underlined. T h e

3

end of the repetitive region is obviously located

at

amino

acid 392 (see Fig. 7) which is directly followed by the con stan t

2 0 9

237

305

313

32

1

329

337

3 4 5

353

361

369

377

385

3)

3

1701

x 1

x 2

x3

x 4

x5

X6

x7

X8

x9

x10

x 1

1

x12

FIG. 7. Comparisonof the repetitive units of region X and

flanking regions.

The sequences of the repetitive region have been

aligned to achieve maximal homology. Th e comparison is based on

region XI, and an altered nucleotide is marked with a n asterisk and

an altered amino acid is

underlined.

The cleavage point for trypsin

which defines region X (7, 20) is immediately before amino acid 292

Glu).The numbers refer to the amino acids in Fig. 3.

region. Since region C erm ina tes at am ino acid 296, the

repe titive part of region

X

consists of exactly 12 units each

with a length

of

24 nucleotides. The bou nda ry etween region

C an d region

X

is, however, not clearly defined sinc e the 12

last nucleotides, coding for he last four am ino acids of region

C, are identical with th e corresponding am ino acids of region

X1

(Fig. 7).

Stru ctura l stud ies based o n the cleavage with trypsin (7,

20) have suggested that region X start s a t am ino acid 292

which differs five amino acids from the bou nda ry chosen in

Fig. 7. As discussed above, the end of region C is probably

related to the other gG-binding regions, but this region has

obviously diverged in the C OO H-term inal end, generat ing a

few am ino acids identical with region X I. Th erefo re, struc-

turally the o ctapep tide f region

X

seems tobe repeated 12.5

times.

Acomparison of the 12 repeated units reveals striking

homologies. The six first amino acids (Lys-Pro-Gly-Lys-Glu-

7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

8/8

1702 D N A Sequence of Staphylococcal Protein

A

Asp) are dentical hrougho ut he X, region. Th e two last

amino acids are changed in a regular pat tern between Asn-

Asn, Gly-Asn, or Asn-Lys. Although th e biological function

of this extremely conserved octapeptide is no t known, clearly

there has been a strong pressure to preserve i ts amino acid

sequence. Hence , 12 nucleotides have changed when com par-

ing the ix conserved amino acids in the12

X,

compartments,

a l l occurring in a wobble posi t ion and therefore representing

silent mutations.

Apart from the dist inct24-nucleotide repeat, there arealso

signs of

a

48-nucleotide rep eat. Th us, the obble base A/G in

th e codon coding for th e first lysine is changed periodically

in regions X7 to X12, and amino cid 7 is changed periodically

between Asn an d Gly in regions 5 to 10 (see ig. 6).

The re also seem s to be omeevidence or

a

homology

gradient throughout the Xregion, a l though the gradient must

be based on a 48-nucleotide repea t rathe r than the prim ordia l

24-nucleotide sequence.

In conclusion, th e evolution of the repe titive part f region

X probab ly involved stepwise gene d uplications of an ances -

tral 24- or 48-nucleotide long sequence. How this evolved at

th e molecular level is unclear, bu t th e nucleotide sequence of

the protein

A

gene from other stra ins, aswell as genes coding

for prote ins with sim ilar repeated structures, may help in

resolving th e molecular events causing tepwise multiple DNA

duplications.

Acknowledgments-We are grateful 50 Dr. Jo hn Sjoqu ist for critical

comments an d advice. We thank Hans-Olof Pette rsson and Bjorn

Jansson for skillful technical assistance and ChristinaPellettieri and

Gerd Benson for patient secretarial help. We also thank Dr. Andras

Gaal for introducing us to the thermostatic LKBMacrophor system

and Dr. S tephe n Fahnestock for a correction of th e nucleotide se-

quence.

REF ERENCES

1. Jeffreys, A.

J.

(1981) in Genetic Engineering (Williamson, R., ed)

2. Fishetti,

V.

A., and Manjula, B. N. (1982) Semin. Infect.

Dis.

4,

3. Hirano, H., Yamada, Y., Sullivan, M., de Crombrugghe, B., Pas-

tan, I., andY am ada , K. M. (1983) Proc. Natl.

Acad.

Sci.

U. s.

A.

Vol. 2, pp. 1-48, Academ ic Press, New York

411-418

80,46-50

4. Ohno,

S.

(1981)

Proc.

Natl. Acad. Sci.

U S.

A.

78,

7657-7661

5. Hartley,

I. L.,

and Gregori,

T. J.

(1981) Gene (Amst.)

13,

347-

353

6. Tanaka,

T.

1979)

J .

Bacteriol. 139,775-782

7. Sjodahl, J. (1977) Eur. J . Biochem. 73, 343-351

8.

Lindmark, R., Movitz,

I.,

and Sjoquist,

J.

(1977)

Eur. J .

Biochem.

74,623-628

9. Beachey, E. H., Seyer, I. M., and Kang, A .H. (1982) Semin.

10. Lofdahl,

S.,

Guss , B., U h lh ,

M.,

Philipson, L., and Lindberg, M.

11. Langone, J. J. 1982) Adu. Zmmunol. 32,157 -252

12. Boyer, H.

W.,

and Roulland-Dussoix, D. (1969)

J.

Mol. Biol. 4 1 ,

13. M arinus, M. G. (1973) Mol.

Gen.

G en et. 1 2 7 , 4 7 4 5

14. Bolivar, F., Rodriquez, R. L., Greene,

P.

J.,

Betlach, M. C.,

Heyneker, H. L., Boyer, H. W., Crosa, J. H., and Falkow,

S.

(1977)

Gene

(Amst.) 2,95-113

15. Roberts, T. M., Swanberg, S. L., Poteete, A., Riedel, G., and

Bachman, K. (1980) Gene (Amst.) 12, 123-127

16. D ente, L., Cesaren i, Y., an d Cortese, R. (1983) Nucleic Acids Res.

17. Birnboim, H. C., and Doly, J. (1979) Nucleic Acids Res. 7, 1513-

18. Morrison, D. A. (1979) Methods Enzymol. 68,326-3 31

19. Maxam, A.M., an d Gilber t, W. (1977) Proc. Natl. Acad. Sci.

20. Sanger, F., Nicklen,

S.,

and Coulson, A. R. (1977) Proc. Natl.

21.UhlBn,M., Nilsson, B., Guss, B., Lindberg, M., Gaten beck,

S.,

22. Kozak,

M.

1983) M icrobiol.

Reu.

47.

1-45

23. Shine, J., and D algarno, L. (1975) Nature Lord.) 54, 34-38

24. McLaughlin, J. R., Murray, C. L., and Rabinowitz, C. (1981)

J .

Biol. Chem. 256,11283-11291

25. Takkinen, K., Pettersson, R. F., Kalkkinen, N., Palva, I., Soder-

lund, H., and Kaariiiinen, L. (1983)

J.

Biol. Chem. 258 , 1007-

1013

26. Johnson, W. C., Moran, C. P., and Losick,

R.

(1983) Nature

(Lond.) 302,80 0-804

27. Sjodahl, J. (1977)

Eur. J .

Biochem. 78, 471-490

28. Movitz,

J.

(1976)

Eur. J .

Biochm.

68,

291-299

29. Y ang, M., Galizzi, A,, and Hen ner , D. (198 3) Nucleic Acids

Res.

30. Shimotsu, H., Kawamura,

F.,

Kobayashi, Y., and Saito, H. (1983)

31. Neugebauer, K., Sprengel, R., and Schaller, H. (1981) Nucleic

32. Horinouchi, S., and W eisblum, B. (1982) J.Bacteriol.

150,

815-

33. Grosjean,

H.,

nd Fiers,

W.

(1982)

Gene

(Amst.)

18,

199-209

34. Fasm an, G. D. (ed) (1976) CRC Handbook of Biochemistry and

Molecular Biology: Nucleic Acids Section 3rd Ed., Vol.

11

pp.

69-183, CRC Press, Inc., Boca Raton, FL

35. Deisenhofer,

J.

(1981) Biochemistry

20,

2361-2370

Infect. Dis. 4,401-410

(1983) Proc. Natl. Acad. Sci. U.

S.

A.

80

697-701

459-472

11,1645-1655

1523

U S . A. 74,560-564

Acad. Sci. U. S. A. 74,5463-5467

and Philipson, L. (1983)

Gene

(Amst.) 23,369-37 8

11.237-249

Proc. Natl. Acad. Sci.

U

S. A.

80

658-662

Acids Res. 9 2577-2588

825

J. Biol. Chem.-1984-Uhlén-1695-702

Documents

Transcript of J. Biol. Chem.-1984-Uhlén-1695-702