Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of...

9
Curr Genet (1991)19:139-147 Current Genetics Springer-Verlag 1991 Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat Takashi Yamada * Department of Molecular Biology, Mitsubishi Kasei Institute of Life Sciences, 11 Minamiooya, Machida-shi, Tokyo 194, Japan Received June 18/October 3, 1990 Summary. A 3 454 base pair (bp) sequence of the large inverted repeat (IR) of chloroplast DNA (cpDNA) from the unicellular green alga Chlorella ellipsoidea has been determined. The sequence includes: (1) the boundaries between the IR and the large single copy (LSC) and the small single copy (SSC) regions, (2) the gene forpsbA and (3) an approximately 1.0 kbp region between psbA and the rRNA genes which contains a variety of short dis- persed repeats. The total size of the Chlorella IR was determined to be 15 243 bp. The junction between the IR and the small single copy region is located close to the putative promoter of the rRNA operon (906 bp upstream of the -35 sequence on each IR). The junction between the IR and the large single copy region is also just up- stream of the putative psbA promoter, 218 bp upstream from the ATG initiation codon. A few sets of unique sequences were found repeatedly around both junctions. Some of the sequences flanking the IR-LSC junction sug- gest a unidirectional and serial expansion of the IR within the genome. The psbA gene is located close to the LSC- side junction and codes for a protein of 352 amino acid residues. A highly conserved C-terminal Gly is absent. Unlike the psbA of Chlamydomonas species, which con- tains 2-4 large introns, the gene of Chtorelta has no in- trons. The overall gene organization of the Chlorella IR is very different from that of higher plants, but a similar gene cluster of rrn-psbA is also found in the IR of Chlamydomonas species and in a single copy region of some chlorophyll a/c-containing algae, indicating a com- mon evolutionary lineage of these cpDNAs. The origin and evolution of the IR structure are discussed in the light of these observations. Key words: cpDNA evolution - IR expansion -psbA - rRNA genes - tRNA genes * Present address: Department of Fermentation Technology, Fac- ulty of Engineering, Hiroshima University, Saijo, Higashihiroshima 724, Japan Introduction The cpDNAs of a wide range of higher plants share a common molecular organization, involving circular molecules with an average size of 130-150 kbp and com- posed of two large inverted repeat sequences (IRs) sepa- rated by single copy regions of different sizes (Palmer 1985). In angiosperms, typical IRs are of the order of 20 kbp with single copy regions of 20 kbp and 80 kbp. Geranium cpDNA, whose IR is 76 kbp (Palmer et al. 1987), and the cpDNAs of a number of legumes, which have lost one arm of the IR (Palmer 1985), are excep- tions. As for non-angiosperms, the IRs so far studied are much smaller, ranging from 9.4 to 17 kbp, with a size of 9.4 kbp for the moss Physcomitrella patens (Calie and Hughes 1987), 10 kbp for the fern Osmunda cinnamomea (Palmer and Stein 1986), 11 kbp for the liverwort Marchantia polymorpha (Ohyama etal. 1986), and 17 kbp for the gymnosperm Gingko biIoba (Palmer and Stein 1986). In spite of such a wide variation in the size of the IR, the order of the genes encoded on it is highly conserved in higher plants: for example, the tobacco IR (approximately 25 kbp) contains the genes tbr four rRNAs, seven tRNAs, five proteins and four unknown open reading frames (Shinozaki et al. 1986) and the 10 kbp region on the SSC side, which contains the genes for trnV, rrn, trnR and trnN, corresponds exactly to the IR (10 kbp) of the liverwort Marchantia polymorpha (Ohyama et al. 1986). The gene order of the remaining part of the tobacco IR (LSC-side) is the same as that of the LSC region just adjacent to the IR in the liverwort. In contrast, most variant forms ofcpDNA have so far been observed among algal species (Cattolico 1986). The largest cpDNA (400-600 kbp), for Aeetabularia (Pad- manabhan and Green 1987), and the smallest one (85 kbp), for Codium fragile (Hedberg et al. 1981), are both found in green algae (Chlorophyta). The gene ar- rangement on the cpDNA in Chlorella (Yamada and Shi- maji 1987b; Yoshinaga etal. 1988), Chlamydomonas (Harris et al. 1987; Turmel et al. 1987) and Codiumfi'agile

Transcript of Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of...

Page 1: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

Curr Genet (1991)19:139-147 Current Genetics �9 Springer-Verlag 1991

Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat Takashi Yamada *

Department of Molecular Biology, Mitsubishi Kasei Institute of Life Sciences, 11 Minamiooya, Machida-shi, Tokyo 194, Japan

Received June 18/October 3, 1990

Summary. A 3 454 base pair (bp) sequence of the large inverted repeat (IR) of chloroplast DNA (cpDNA) from the unicellular green alga Chlorella ellipsoidea has been determined. The sequence includes: (1) the boundaries between the IR and the large single copy (LSC) and the small single copy (SSC) regions, (2) the gene forpsbA and (3) an approximately 1.0 kbp region between psbA and the rRNA genes which contains a variety of short dis- persed repeats. The total size of the Chlorella IR was determined to be 15 243 bp. The junction between the IR and the small single copy region is located close to the putative promoter of the rRNA operon (906 bp upstream of the -35 sequence on each IR). The junction between the IR and the large single copy region is also just up- stream of the putative psbA promoter, 218 bp upstream from the ATG initiation codon. A few sets of unique sequences were found repeatedly around both junctions. Some of the sequences flanking the IR-LSC junction sug- gest a unidirectional and serial expansion of the IR within the genome. The psbA gene is located close to the LSC- side junction and codes for a protein of 352 amino acid residues. A highly conserved C-terminal Gly is absent. Unlike the psbA of Chlamydomonas species, which con- tains 2 - 4 large introns, the gene of Chtorelta has no in- trons. The overall gene organization of the Chlorella IR is very different from that of higher plants, but a similar gene cluster of rrn-psbA is also found in the IR of Chlamydomonas species and in a single copy region of some chlorophyll a/c-containing algae, indicating a com- mon evolutionary lineage of these cpDNAs. The origin and evolution of the IR structure are discussed in the light of these observations.

Key words: cpDNA evolution - IR expansion -psbA - rRNA genes - tRNA genes

* Present address: Department of Fermentation Technology, Fac- ulty of Engineering, Hiroshima University, Saijo, Higashihiroshima 724, Japan

Introduction

The cpDNAs of a wide range of higher plants share a common molecular organization, involving circular molecules with an average size of 130-150 kbp and com- posed of two large inverted repeat sequences (IRs) sepa- rated by single copy regions of different sizes (Palmer 1985). In angiosperms, typical IRs are of the order of 20 kbp with single copy regions of 20 kbp and 80 kbp. Geranium cpDNA, whose IR is 76 kbp (Palmer et al. 1987), and the cpDNAs of a number of legumes, which have lost one arm of the IR (Palmer 1985), are excep- tions.

As for non-angiosperms, the IRs so far studied are much smaller, ranging from 9.4 to 17 kbp, with a size of 9.4 kbp for the moss Physcomitrella patens (Calie and Hughes 1987), 10 kbp for the fern Osmunda cinnamomea (Palmer and Stein 1986), 11 kbp for the liverwort Marchantia polymorpha (Ohyama etal. 1986), and 17 kbp for the gymnosperm Gingko biIoba (Palmer and Stein 1986). In spite of such a wide variation in the size of the IR, the order of the genes encoded on it is highly conserved in higher plants: for example, the tobacco IR (approximately 25 kbp) contains the genes tbr four rRNAs, seven tRNAs, five proteins and four unknown open reading frames (Shinozaki et al. 1986) and the 10 kbp region on the SSC side, which contains the genes for trnV, rrn, trnR and trnN, corresponds exactly to the IR (10 kbp) of the liverwort Marchantia polymorpha (Ohyama et al. 1986). The gene order of the remaining part of the tobacco IR (LSC-side) is the same as that of the LSC region just adjacent to the IR in the liverwort.

In contrast, most variant forms ofcpDNA have so far been observed among algal species (Cattolico 1986). The largest cpDNA (400-600 kbp), for Aeetabularia (Pad- manabhan and Green 1987), and the smallest one (85 kbp), for Codium fragile (Hedberg et al. 1981), are both found in green algae (Chlorophyta). The gene ar- rangement on the cpDNA in Chlorella (Yamada and Shi- maji 1987b; Yoshinaga etal. 1988), Chlamydomonas (Harris et al. 1987; Turmel et al. 1987) and Codiumfi'agile

Page 2: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

140

(Manhart et al. 1989) is very different from that of higher plants; this is especially true in the last, which lacks the IR structure. In Euglena gracilis (Euglenophyta), rRNA operons are not in an inverted order in its cpDNA but rather in a tandem array. The rRNA operon exists only once in red algae (Rhodophyta) such as Porphyra yezoen- sis and Griffisthia pacifica (Li and Cattolico 1987). In brown algae ( Chromophyta), Dictyota dichotoma (Kuhsel and Kowallik 1987) and Pylaiella littoralis (G6er et al. 1988) both possess cpDNAs with very short IRs of no more than the rR NA operon size. On the other hand, the cpDNAs of Ochromonas danica and Olisthodiscus luteus contain much larger IRs (Reith and Cattolico 1986). Re- cently, the cpDNA of the cryptomonad Cryptomonas q~ (Cryptophyta) was shown to contain an IR of the rRN A operon size (Douglas 1988). Thus, it appears that the presence, or absence, of an IR and its size, if present, are highly variable among algae.

In order to understand the biological and evolution- ary significance of the IR structure of cpDNA, which is highly conserved in a wide range of higher plants, it is important to study various forms of cpDNA. In the pres- ent study, the IR structure of the unicellular green alga C. ellipsoidea was examined in detail because of its pecu- liar organization (Yamada and Shimaji 1987b). It was found that the Chlorella IR was 15 243 bp long and con- tained the genes for three rRNAs, three tRNAs, psbA and four URFs organized in a unique manner. The origin and evolution of IR structure, conserved in higher plant and some algal cpDNAs, are discussed in the light of these observations.

Materials and methods

Algal and bacterial strains. C. ellipsoidea C-87 was obtained from the algal culture collection of the Institute of Applied Microbiology, University of Tokyo. E. coli HB101, JM101 and JMI09 were used for bacterial transformation and propagation of plasmids.

DNA andRNA. Chloroplast DNA from C. ellipsoidea was prepared as described previously (Yamada 1982). Plasmid DNAs were pre- pared according to Maniatis et al. (1982). A gene library of C. ellip- soidea cpDNA was constructed by ligation of SstI-digested cpDNA fragments and Sst I-digested pUC 13 (Yamada et al. 1986) and trans- formed into E. coli HBI01 and JM109. Subcloning was carried out using the same vector and hosts. Total chloroplast RNA was pre- pared from the isolated chloroplasts (Yamada 1982) by phenol ex- traction and salt precipitation (Davis et al. 1986).

Nick translation and hybridization. Southern, Northern and colony hybridizations were carried out according to Davis et al. (1986). DNA probes were labeled by nick translation with a Takara nick translation kit (Takara Shuzo) and [e-32p]dCTP (110 TBq/mmol, Amersham, Buckinghamshire).

Sequencing of DNAfragments. Restriction fragments containing the 1R region of the Chlorella cpDNA were cloned into M13 mp18 and 19 (Horrander et al. 1983) in both orientations. Single-stranded DNA was sequenced by the chain termination procedure (Sanger et al. 1977), using [~-35S]dCTP (30 TBq/mmol, New England Nu- clear, Wilmington, DE) and Sequenase (Toyobo Biochemicals). Both DNA strands were sequenced at least twice, and overlaps were obtained at each restriction site. Sequences were compiled and ana- lyzed using GENETYX software on a NEC PC-98RX computer.

S1 mapping. S1 mapping, to determine the 5'- and Y-ends of the psbA mRNA, was carried out with total cpRNA as described previ- ously (Yamada and Shimaji 1987b).

Results

Location of psbA on the Chlorella cpDNA

When total C. ellipsoidea cpDNA, digested with five re- striction enzymes (KpnI, PvuII, SacI, SphI and XbaI), was electrophoresed on an agarose gel, transferred to nitrocellulose and hybridized with a 3Zp-labeled 1.2 kbp SpeI fragment from pTB28, which contained the entire coding region of tobacco psbA (Sugita and Sugiura 1984), two hybridizing bands always appeared among the frag- ments produced by each restriction enzyme. They are fragments of 25 kbp and 20 kbp for KpnI, 12.0 kbp and 8.0 kbp for PvuII, 14.0 kbp and 8.0 kbp for SacI, 3.5 kbp and 2.3 kbp for SmaI, 25 kbp and 15 kbp for SphI and 12.5 kbp and 2.8 kbp for J(baI (data not shown). This suggests that there are two copies ofpsbA in C. ellipsoidea cpDNA. To determine the precise location and molecular structure of psbA, colones containing this gene were se- lected by colony hybridization with the tobacco psbA probe from the gene library of Chlorella cpDNA (Ya- mada and Shimaji 1986a). As expected, two different clones were obtained; one contained a 13.5 kbp SstI in- sert (pCCS14) and the other a 7.5 kbp SstI insert (pCCS65). Restriction maps for these clones are shown in Fig. 1 a. A region of approximately 4.0 kbp at the 3' end of both inserts gave the same restriction map. In this region there is a 1.5 kbp-HindIII fragment that hy- bridized to the psbA probe (data not shown). Previous mapping studies of the whole cpDNA by Southern hy- bridization of the restriction fragments (Yamada et al. 1986) showed that, except for the IR on the Chlorella cpDNA, there were no extended repeat sequences. Thus, the 4.0 kbp sequence ofpCCS14 and pCCS65 seems to be part of the IR. This was confirmed by hybridization of the psbA probe to the 9.0 kbp and 8.5 kbp EcoRI frag- ments (data not shown), which were previously found to contain a part of the IR including the genes for 23S

Ala 3' f Set rRNA, tRNAu6 c and the half o tRNAcc U in common and different sequences of the LSC adjacent to the IR (Yamada and Shimaji 1987 b). Indeed, the nucleotide se- quence determined around the SstI cloning site of both the pCCSI4 and pCCS65 inserts included the 5' half of the coding region for the Set tRNAGctj gene (Yamada 1989).

Southern hybridizations, probed with a 300 bp PstI- Spel fragment of tobacco psbA containing the coding region for the first 86 amino acid residues of the protein, revealed that the 5' end of Chlorella psbA was within a 600 bp SpeI-NheI fragment of pCCS14 and a 700 bp SpeI-NheI fragment of pCCS65 (data not shown). Thus, the psbA gene is located entirely within the IR of the C. ellipsoidea cpDNA.

Location of the junction between the IR and the LSC

Figure 1 a shows that the same pattern of restriction sites between pCCS14 and pCCS65 continues through the

Page 3: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

141

s

i Ikb

lkb

E Sp ~ X I I | /

S i 1 ~ $3~ HaXH B BH , I II

5'[ psbl 13r

s xsp x I j f i L l L

5i psi] A ]3'

~ t

s J [

BS ld

5' trnS 0

I>

S I I

BS I t 5'trnS

D h

pCCS14

pCCS65

S )(11 B SIS I I I i l pCCS208

Xh B SIS X E ~b I I [I I �9 pCCEX302

" 5" Ir~68 rBNA b L ~

Fig. I a, b. Restriction maps of the pCCS14 and pCCS65 inserts (a) and the pCCS208 and pCCEX302 inserts (b). The coding regions for psbA and trnS are shown by boxes. The coding region for a 5' part of 16S rRNA is shown below the pCCEX302 map. Restriction sites for BglII(B), EcoRI(E), HindIII(H), HaeIII(Ha), SacI(S), Sall(Sl), SpeI(Sp), Sau3A(S3), XbaI(X) and XhoI(Xh) are as indi- cated. The size bars for each map represents I kb. Sequencing strategies are given under the map

psbA region and ends beyond this gene, indicating that the endpoint of the IR is very close to the 5' end ofpsbA. Therefore, the 1.55 kbp SpeI-XbaI fragment of pCCS14, and the 1.65 kbp SpeI-XbaI fragment of pCCS65 con- taining the coding region ofpsbA and the end point of the IR, were subcloned in order to determine the precise loca- tion of the junction point. Fine maps for these clones, shown in Fig. I a, indicate that the junction point is within a 350 bp SpeI-HindIII fragment of pCCSI4 and a 450 bp SpeI-HindIII fragment of pCCS65. The nucle- otide sequences of these fragments were determined by the strategy indicated in Fig. 1 a and are shown in Fig. 2 a. The junction is not within any coding region, but is very close to the putative promoter sequence ofpsbA, 218 bp upstream of the ATG initiation codon. Though there is no obvious sequence homology beyond the junc- tion, AT-clusters of 10-15 bp occur frequently in both flanking LSC regions.

Location of the junction between the IR and the SSC

Previous studies showed that the XhoI site located 5.8 kbp upstream of the 16S rRNA gene on one side of the SSC was absent from the other (Yamada 1983) and, therefore, one endpoint of the IR has to be between the J(hoI site and the 16S rRNA gene. A 2.5 kbp XhoI-EcoRI fragment containing this region was cloned (pCCEX302) and a part of it was sequenced (Yamada and Shimaji 1987 b). Using this clone as a probe, clones containing the

other endpoint of the IR were screened by colony hy- bridization from the gene library (Yamada and Shimaji 1986a; Yamada et al. 1986). Two kinds of clones were obtained and designated as pCCS114 and pCCS208. Clone pCCSl14 contained a 6.0 kbp insert, which in- cluded the entire insert of pCCEX302 (2.5 kbp), while pCCS208 contained a 9.5 kbp insert. Figure I b shows restriction maps of pCCS208 and pCCEX302. Since the same pattern of restriction sites between the two clones occurs on the 3' side of the BglII site on pCCEX302, one junction must be within the 850 bp region between Xhol and BglII on pCCEX302. The nucleotide sequence of this region was determined by the strategy outlined in Fig. 1 b and is shown in Fig. 2b. Subcloning of the pCCS208 fragment corresponding to this region was difficult; sev- eral trials with different vectors, different host strains, and different restriction enzymes all failed. Therefore, the nucleotide sequence of the other junction region was de- termined with pCCS208 and a synthetic oligonucleotide, 5'-AGATCTAAATTTTGTTCT-Y, which was comple- mentary to the sequence just upstream of the BgIII site of pCCEX302. The nucIeotide sequence is compared with that of pCCEX302 in Fig. 2b. The endpoint of the IR was identified between positions 162 and 163 where the sequence homology ends. This endpoint is also not within any coding region and is close to the putative promoter region of the 16S rRNA gene (906 bp upstream of the -35 region). Determining both endpoints of the IR made it possible to estimate the size of the IR to be approximately 56 kpb; this is less than the value of 22.5 kbp determined previously by electron microscopy (Yamada 5983).

Determination qf the entire nucteotide sequence of the IR

A map of the entire IR region of C. ellipsoidea cpDNA is shown in Fig. 3. The nucleotide sequences of some parts of this region have already been reported: namely, 892 bp of the upstream regions of the 16S and 23S rRNA genes (Yamada and Shimaji 1987b), 1 532 bp of the coding re- gion for 16S rRNA (Yamada 1988), 4894 bp of the 16S- 23S rRNA spacer region (Yamada and Shimaji 1986a), 3 468 bp of the coding regions for 23S rRNA (Yamada and Shimaji 1987 a), for 5S rRNA (Yamada and Shim@ 1986 b) and for Ser tRNAGc u (Yamada 1989). The remaining parts of the IR were sequenced by the strategy outlined in Fig. 1 a and the sequence (3 202 bp) is shown in Fig. 2 c and d. Addition of this sequence to the 252 bp SSC-side sequence made it possible to determine the total size of the Chlorella IR to be 15 243 bp.

Only two genes, psbA and trnS (GCU), were found in this region by computer analysis. Matrix analyses, how- ever, revealed a surprisingly complicated structure for it: a region of about 1.0 kbp (Fig. 2d, positions 354-1 319) between psbA and trnS is interposed between a pair of inverted repeat sequences of 185kbp (/~-elements). Within this region, there is a pair of a-elements, which were previously found in the 16S-23S rRNA spacer re- gion as terminal repeated sequences of a transposon-like structure (Yamada and Shimaji 1986a). The sequence between the a-elements is a chimera composed of se-

Page 4: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

]0 20 30 40 50 60 70 80 90 lO0 JLA ACTAGTTTCT TTAGTAAATT ATCATCCAGG TTTAAATTGA AGACTCGACT TAAATACTTT TTTGTAGGGA AATTGAAATA ATCTTTATTG CTTGTTTTTT

1]0 120 130 ~40 150 160 ~70 ]80 JLA CTAAAGTT/T TTCCCATTCT GCTAAATACT TTTTTAAACT TGTIAAAAAA TCCCCAACGG AGTTTTTAGG ATATCAAGTT JLB ACTA GTTTCTTTAA ATATAGAGGG ~GGCTTCCTA

190 200 GTTTTAATAT TACGAAAAGA AATTAAAAAC GAAACGCAGC

E 210 220 230 240 250 260 270 28D 290 300

JLA AAAAGACTGT AAGGAGATTT AAAAAGCAGC TTTATAATTG GTGTCCAATC AAAAGATGCT ACTCTTTGCT TGTGATTACT GTACAGCTCA AGATAGAAAA JLB GCC~G~I'GGT AAGAACTAAT TAGTTCTTAC AGAAAAAAAA TCCi~AAGTAA GACCCATAAA ACTTGTGTTC TGGACTAGGG CGTACGITATT ATTAGGCTTT

F 310 320 330 340 350 360 370 380 390 400

JLA CTGTTGCTGA AATTATTATT TCGCATAAGC TAGACTCTCT TACATCTCAT TGCTAATCGC TTTATACTAC GCCTAAGAAA CCTTTCTAAA GACGCCAATG JLB CCCCGAAGGC TTTCACAAGC GTACGTTGTG AAAACTCTAT GTTTTACCGT TGCCCTGCCG GACACCGAAG GGTCGGAGAT AAAACATTTA GGGGAACCTT

B 420 430 440 450 460 470 480 490 500 410

JLA TATATTAAGC JLB TGTC ~HHH~X-*

TTTGCTTGCC TACCAAAGGG CAACACTTTG TGTTTTTATT TAATACCAAT AAATTA~TA GTACAGAGAA GACCATAGTT TTGCTTAAATT

c Fig. 2 a - d . Nucleotide sequences of the junction between the IR and the LSC (a), the junct ion between the IR and the SSC (b), the psbA region (c) and the chimeric region downstream ofpsbA on the IR (d). a Sequences of the SpeI-HindIII fragments ofpCCS14 (JLA) and pCCS65 (JLB) are compared. Stars indicate nucleotide se- quences identical between the two clones. The junction was found

10 20 30 4o 60 60 70 80 90 100 AAGCTTTGCTTGCCTA~CAAAGGGCAACA~TTTGTGTTTTTATTTA/~TA~CAATAAATA~GTA~AGAGAAGACCATAGTTTTG-~CTTAA.m-AAT~'TTCTT

210 F 2 2 0 230 240 250 260 270 280 290 300 AGAGTTAAAAATTAT.•lATG•CTGCT•TTTTAGAAAGACGTGAAAGCGCT•GCCTATGGGCTCGCTTCTGTGAATG••TTACTAGC•CTGAAAACCGTTT ^

~ etThrA~a~eLeuG~uArgArgG~u~erA~a~erLeuTr9A~aArgPhe~ysG~uTrp~eThrSerThrG~uAsnArgLeu 310 320 330 340 350 360 370 380 390 400

TACATCGGTTGGTTTGGTGTTCTA•TG•TCCC•A•TTT•TTAACTGCAACTTCTGT•TTTAT•ATCGC•T•••T•GCTGCAC•TCCAGT•G•TATCGATG Tyr••eG•FTrpPheGl••alLeuNet•lePr•ThrLeuLeuThrAlaThrSer•alPheIle[leAlaPhe[leA•aA•aPr•Pr••a••splleAspG•y

410 420 430 440 450 460 470 480 490 500 GTATTCGTGAGCCTGTTTCTGGTTATTTACTTTACGGA•ACAATATC•TTT•TGGTGCTGTTGTTCCA•CTT•A•A•GCGATTGGTCTTCACTTCTACCC

•|eArgGluPr••alSerGlyTyrLeuLeuTyrGl•AsnAsn•leI•e•erGl•Ala•al•alPr•ThrSerAsnA•aI•eG•yLeuHisPheTyr•r•

510 520 530 540 550 560 570 680 590 600 AATTTGGGAAGCTGCTTCTTT•GACGAGTGGTT•TAC•ACGGTGGTCCTTACCAACTT•TCGTTTGCCATTTCTTCTTAGGTATCTGCTGCTACATGGGT

••eTrpGluAlaAlaSerLeuAspG•uTrpLeuTyrAsnGlyG•yPr•T•rGlnLeu••eValCysHisPhePheLeuGlyIleCysC•sTyrHetG••

610 620 630 640 650 660 670 680 690 700 cGTGAGTGGGAACTTTCTTTCCGTTTAGGTATGCGTCCTTGGATTGCTGTAGCTTACTCTGCTCC•GTTGCTGCTG•T•CTGCTGT•TTTATCATTTACC ArgG~uTrpGluLeu~erPheArgLeuGlY~etArgPr~Trp~eA~a~a~AlaTyrSerA~aPr~alA~aAIaAlaThrAlaValPheIlelleT~rPr~

710 720 730 740 750 760 770 780 790 800 CTATCGGTCAAGGTTCTTTCTCTGATGGTATGCCTTT•GGTATTTCTGGTACTTTCAACTTC^TGATCGTATTCC••GCTG•ACACAACATCTTAATGCA

~IeG~yGlnGlY~erPhe~erAspGlyMetPr~LeuG~[e~erGlyThrPheAsn~heMet~le~alPheGlnA~aGluH~sAsnIleLeuMetH|s

810 820 830 840 850 860 870 880 890 900 •CCATTC••CATGCTTGGTGTTGCTGGTGTTTTTGGTGGTTCTTTATTCTCTGCTATGCACGGTTCTCT•GTA•CTTCTTCTTT••TCCGTG•AACTACT Pr~PheHis~etLeuG~yVa~AlaG~Va~PheGlyGly~erLeuPhe~erA~a~etHisGIySerLeuVs1ThrSerSerLeuI~eArgGluThrThr

910 920 930 940 950 960 970 980 990 1000 GAGAATGAATCTCGTAA•GCTGGTTAC•AATTTGGT•A•GAAGA•GAAACTTAC•ACATCGT•GCTGCTCACGGTTACTTTGGTCGTTTAATCTTC•AAT GluAsnG~uSerAr&Asn~laG~Tu

1010 1020 1030 1040 1050 1060 1070 1080 1090 1100 ACGCTTCTTTCAACAACTCTCGTTCTCTACACTTCTTCCTAGCTGCTTGGCCTGTAGTTGGT^TCTGGTTCACTGCTTTAGGTATTTCAACTATGG•ATT

Ala~erPheA~nAsnSerArg~erLeuH~sPhePheLeuAlaA~aTrpPr~al~alG~y~|eTrpPheThrA~aLeuGly~eSerThrNetAla~he

III0 1120 1130 1140 1150 1160 1170 1180 1190 1200 CAACCTAAATGGTTT•AACTTCAA•CAATCTGTTGTAGA•TCTCAAGGTCGTGTAAT•AACA•TTGGGCTGACATTATTAA•CGTGCTAA•TTAGGTATG AsnLeuAsnGIyPheAsnPheAsnG~nSerVa~Va~AspSerG~nG~yArgVa~eAsnThrTrpA~aAsp~e~e~snArgA~aAsnLeuG~y~et

1210 1220 1230 1240 1250 1260 1270 I 1280 1290 1300 GAAGTAATGCACGAACGTAACG~G~A~AACTTCCCTCTAGACTTAGCTTCTGTTGAAG~T~TT~AATTGCG~AAT~AAAGCAATCGAAGCACAGAAGAT GIuValMetH|sGluArgAsnAlaHisAsnPheProLeuAspLeuAlaSerValGluAlaProSerlleAla~*~***

1310 1320 1330 1340 1350 1360 1370 1380 1390 1400 TATGTCTAATAA•T•TCTGTAGAGACAGAAGAATATCCCCCCTA•GGTATAGGCTTCAATCCTTCTCAAGTTT•AAAGCACACCTTCCATTGTTTGAAT•

1410 1420 1430 1440 1450 .~TTG3ATTCAAATTAAACTTGCAAAAATAT~TTTTATATCTAAGTTTAAAAGCTT

between positions 404 and 405. A sequence of about 90 bp of the IR a~acen t to the HindIII site is also shown. Sequence elements re- peated on the IR are boxed (B, E, and F). b the sequences of pCCEX302 (3SA) and pCCS208 (JSB) are compared. Sta~ indicate identical nucleotide sequences. The junction was found between positions 162 and 163. e the HindIII site (positions 1-6) corre-

Page 5: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

10 20 JSA GTGGCAATCA AGGACCTATA JSB

110 120 JSA TCACCGGAAA TGCTCAGTCT JSB ATACGCTCTA CGCGATTTCG

30 40 50 60 70 80 90 lO0 GGATACAAAA TGACAAATTA CCACTTCATC CGATTTTGGA GCGACAAGCT CGGCAATCTT TAAAATACTT TTGTTCTTCC

TCG

130 ]40 150 160 170 180 190 200 CGAATACAAG TTCATGAAAT ATAACCTTGC AATAGACCAA CAAGAGAAGC AAATCGAAAA GTGGGTCGAA AAGAGCCTAG ACCGCGTAGA AGTGATGGCC GATCAAGAAG GTGATTGCGG TC =X==X=== =======x== ,-w-w-==:==:H~ ~,-x--H;xxx~,-~

210 220 230 240 250 260 270 280 290 300 JSA AGCTACCACC TGAATTTAGT TGCTTGGTAT TCGACACTAA TGTTTTGATA AACAGTATTA CACGCGATTC ACACAATATA AAACGTTTAG CCTTCAAGGC

310 320 330 340 350 360 370 380 390 400 OSA AGACTATACC CCTGAAATTA TAATATTCTT ACAAGATTTT ACAGCTAAAT GTAAACATGT CCGTGCAGCT CAAACTCATC TAAATTCTAC TGCTTATCAA

410 420 430 JSA TTTTTAGAAC AAAATTTAGA TCT JS8 ~-w.w-.x-w-w-w-w~ .w-xx=::.:==:

b

10 C 20 30 40 50 60 70 80 90 100 'AAGCTTTCGC CCATTGGTTG CATTCGCACA AACATTTTAA AC[AAAGTTTT GTjTCGTAGAA C z~ACAAAACA TTGAGACTAG AAAATATAAA CTAAAACGCT

110 120 130 140 150 160 17,9 _J~_~ ,~.j~,~O . . . . . . . . 200 AGGTTACGGC GGTTTTAAAG TAAGCAATGA CAAGTTGGCT TATAACAAAC AATTTGCATT T T A A A A T A ~ ~

3'60 370 380 390 400 ,i~:~ TCTACT TCGTGGAAGT CAATGCTGGT CAAATTTTCT TGTCTTTACC

410 420 430 440 450 460 470 480 490 500 CTTTGGTAAA GACAGAAGGA TCAGGTTATT TTTTTAACGC TGGTCTTCGT CCAGTAATTA ACGTAGCAGT ATAAAACATA AAGTTTTACG CTTTGCCCCA

510 520 530 540 550 560 E 570 580 590 600 TAAAACTTGG CTTTATAAGT TTGTGCTTTA CCCGTTGCTA AATGTTTTAT C~GGCTTCCTA AATTAAAAAC GAAGCGCAGC GCCT~CTCCAT TGTTTGACTT

610 620 630 640 F 650 660 670 680 690 700 CTGCGAAGCA GAAGTTAAA[r GGTAAGAACT AATTAGTTCT TACAGAAAAA AAGTCCb~rAA TTCGTTCGCA CAAATTAAGA TCTTGTGCTT AGCACAAGTC

710 720 730 ~ ~ z ~ ' ~ / ~ ~ 770 780 790 800

810 92o 64o 860 850 87o B 860 69o 9o0 TGTTCTGGGC 'rACGGCGTAC GTC~A~TAGG CTTCCCCCGA AGGCTTTCAC AAGCTACGCT TGTGAAAACT TTATGTTTTA CCGTTGCCC, GCCGGACACG

910 920 930 940 950 950 C 970 980 990 1000 ~GCAAGGGTCG GAGATAAAAC ATTTAGGGGA ACCTT!~GTTr AAAAGCT~CG CCCATTGGTT TGGCATTCGC ACAAACGTTT TAAAC~I'TAGA GATAAAACAT

1010 1020 1030 1040 1050 1060 1070 1080 1090 1100 AAAGTTTTAA AACGTACGCT TGTTAAAGCC TTTCAACGC~. AGGGTATACG AACAAAGTTC GTCAACCTTA CTACCAACGC AGAAGACTTT TTTTCCTGCG

1110 1120 1130 1140 1150 11600" 1170 1180 1190 G ~ 1200 ~GCGCATCGT TGGTAGTTTA GGCA~ACTTA TAAAACTTTG TATAAATACA ' i':~ :!~

~ " ~ .... : " ATTA 1250 1260 1270 1280 1290 1300

1310 132 1400

I GAAGACTTC CACGAAGTG ' ' i

1510 1520 1530 1540 ~ TTCT_~ CAAAGTAACA CCAGAGGTGT TACTAGGATT

1610 1620 1630 1640 GATCTTCGAT CAAAGCCTAA TACAATCGGA GATTGTATTT

1710 1720 1730 1740 GTGTTCTGCC TTCGCTTGTT TTGGTT~GAG TGATGTCTGA

1810 1820 ~,830 1840 C~CCTCTCACT CCGJTACTTGT [A'ACCCGGGAG CCTAAGTATA

1910 1920 1 9 3 0 1940 TTGTGTGCTT CGCCCTTTGT ACCAAAGGGT AAAGACACAA

2010 D 2 0 2 0 2030 2040 CCAACGCAG^ AGACTTTTTT TCTGCGCTGC GCTACGTTGG

d

1550 1560 1570 1580 1590 1600 TTCTCTTTCA GAGA~ATCTT TCAGAGA~GA TCTAAGTGTA ACGATGTTAC ACAAAGTTTT

1650 1660 1670 1680 1690 1700 TGGACTTGAT CGAAGTTCAA CATCTTAATA GAGCGTAGCT TGTAAAACTA TTATGGACTT

1750 1760 t [ n ~ 1770 1780 1790 1800 GTGGCCGAAA GAGCTCGATT GCTAATCGAG TATACAGCTC CCTGTACCGA GGGTTCGAAT

1850 1860 G 1870 1880 1890 1900 AATCACACTC TCCATTGTTT GAATACAAAG TATTCAAAAT GTGAAAATAT TTt~TACAAAG

1950 1960 1970 1980 1990 GTTGCC~AGG GTATACGAAC AAAGTTCGTC AGCCTAGGTT CCTAACTATT

2050 2060 2070 2080 2090 AGTTTAGGCA AJCAATATAAA GTTTTAACAA GCTACGAGCC CCCTTGCCAT

sponds to tha t at pos i t ions 4 0 7 - 4 1 2 in a. A n O R F forpsbA and the pu ta t ive p r o m o t e r sequences ( - 1 0 and - 3 5 ) are all boxed. The 5'- and 3 ' -ends o f the psbA m R N A , as de te rmined by S l - m a p p i n g , are shown by vertical arrowheads. Horizontal arrows indicate smal l in- verted repeat sequences . The wavy line shows a Sh ine -Da lga rno (SD) sequence, d the HindIII site (posi t ions 1 - 6 ) co r r e sponds to

2000 ^ACCTTACTA

2100 TGCTTTCTAA

tha t at pos i t ions 1 451 I 456 in e. Repea ted sequences (B, C, D, E, F a n d G) and the gene for trnS ( G C U ) are all boxed. Shadowed boxes indicate repeated sequence e lements (p and a) associa ted with t r ansposon- l ike s t ruc tures (Yamada and Shimaji 1986a). Arrows show smal l inver ted repeat sequences

Page 6: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

144

1•16S r R N A ORFI ~ u ~ lie ortr-4 5s rRN^

I kA I I ~ ItL LU LI 1532 4894 252 892 I I ' I

IAI B C D

Fig. 3. Map of the inverted repeat (IR) region of C. ellipsoidea epDNA. Coding regions are indicated by boxes. The orientation of transcription is shown by arrows. Triangles represent repetitive se- quences (e,/~ and a) found in this region. P1 and P2 are the putative promoters of back-to-back rrn operons (Yamada and Shimaji 1987b). LSC, large single copy region; SSC, small single copy re-

quences upstream of the IR-LSC junction (B, 125 bp), downstream ofpsbA (C, 49 bp) and downstream of trnS (D, 87 bp), as indicated in Fig. 2 d. Most interestingly, the 125 bp sequence of B is a direct repeat of the IR-LSC junction sequence of the LSC side (Fig. 2 a and d). In the vicinity of this part, there are also two fragments of the LSC-junction sequence repeated directly, E (33 bp) and F (38 bp). Such a mosaic structure of the 1.0 kbp region flanked by/~-elements strongly suggests frequent recom- binational events within it. It is interesting to note that there is a 11 bp sequence (CTCCAAAA/GTAA), re- peated tandemly, next to the/~-elements, giving a struc- ture typical of transposon-integration sites. Transposable element-like structures were previously found (both c~- linked and a-linked) in the 16S-23S rRNA spacer region of the C. ellipsoidea IR (Yamada and Shimaji 1986a). Thus, three kinds of transposable element-like structures (4.5 kbp in total) occupy about a third of the C. ellip- soidea IR (15 243 bp).

Nucleotide sequence and deduced protein sequence of psbA

The coding region of C. ellipsoidea psbA contains 1 056 bp, which corresponds to a protein of 352 amino acid residues. In contrast to the psbA genes of Chlamy- domonas reinhardii (Erickson etal. 1984), C. smithii (Palmer et al. 1985) and C. moewusii (Turmel et al. 1988), which contain four, three and two large introns respec- tively, the Chlorella psbA contains no intron. In Fig. 4, the predicted amino acid sequence of the Chlorella gene is compared with the corresponding proteins of the blue- green alga Anacystis nidulans (Golden et al. 1986), the prochlorophyte Prochlorothrix hollandia (Morden and Golden 1989), the unicellular green alga C. reinhardii (Erickson et al. 1984) and tobacco (Sugita and Sugiura 1984), with which it shows homologies of 89.5%, 88.1%, 93.2% and 93.8%, respectively. Like the higher plant proteins, there is a seven amino acid gap near the C-ter- minus compared to the blue-green algal proteins (Golden et al. 1986). In addition, the Chlorella protein lacks the C-terminal Gly residue, resulting in a protein which is one-amino acid smaller than that of higher plants. This Gly residue is also absent for the psbA protein of C. rein- hardii (Erickson et al. 1984). Figure 4 shows that all the proposed functional residues making up the domains of this protein (Rochaix and Erikson 1988; Gingrieh et al. 1988) are also conserved in Chlorella, [e.g. the chloro- phyll- and non-heme iron-binding residues of His 198,

/

23S r R N A ~ s = p s b A , I ^ k kk,l 1 I

I I t 3206 1260 I 3202 I E I F G

gion; psbA, gene for the photosystem II thylakoid protein D1 or the 32 kDa QB-binding protein. The nucleotide sequences of regions A and G are shown in Fig. 3; those of B-F were reported previously (Yamada and Shimaji 1986a, ]987a, b; Yamada 1988). Sizes are shown in bp

215 and 272 and the QB- and herbicide-binding domains from residues 219 to 275 (except for 233-238)].

Codon usage in the C. ellipsoidea psbA is strongly biased against G in the third position; codons ending in G account for only 8.2% of the total. Such a bias is also found for rbcL from another strain of C. ellipsoidea (Yoshinaga et al. 1988). The termination codon forpsbA, TAA, is the same as all psbAs so far studied. The 5' and 3' ends of the psbA mRNA were determined by $1 map- ping and are indicated on the DNA sequence shown in Fig. 2 c. A putative transcription initiation site is found about 60 bp upstream of the first ATG of the reading frame (Fig. 2 c), where the -35 and -10 sequences corre- spond to TTGTTC and TGTATT respectively. There is a Shine-Dalgarno (SD) sequence, AAGG, several base pairs downstream from the transcription initiation site. S1 mapping of the 3' end of the message showed that the termination site is about 166 bp downstream from the stop codon. A stem-loop structure for a prokaryotic ter- mination signal can be formed in the sequence around this site (Fig. 2c).

D i s c u s s i o n

Organization of the IR of C. ellipsoidea cpDNA

Although the size of the IR varies from 76 kbp for gera- nium (Palmer etal. 1987) to 4.7 kbp for Dictyota di- chotoma (Kuhsel and Kowallik 1987), all IRs so far known contain the rRNA operon; in other words, there is no evidence for a IR without rRNA genes. It is note- worthy that the algal IRs ofDictyota dichotoma (4.7 kbp, Kuhsel and Kowallik 1987), Pylaiella littoralis (6 kbp, G6er et al. 1988) and Cryptomonas (5.5-6 kbp, Douglas 1988) contain only enough room for the rRNA operon. As determined in the present work, the IR of C. ellip- soidea contains the gene for psbA in addition to rrn (Figs. 1 a and 3). The rrn-psbA cluster is also reported to exist in the IRs of Chlamydomonas reinhardii (Harris et al. 1987), C. eugametos and C. moewusii (Turmel et al. 1987); the last two also contain rbcL next to the cluster (Fig. 5). A similar rrn-psbA linkage occurs in the cpDNAs of Py- laiella (G6er et al. 1988) and Cryptomonas (Douglas 1988), where psbA is, however, in a single copy region immediately adjacent to the IR, which consists solely of rrn (Fig. 5). These structures suggest a definite relation- ship between algal cpDNAs, at least for unicellular green algae and some chlorophyll a/c algae. Based on these

Page 7: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

I0 2O An **TA*QR**S ASL*QQ**E* Pr **TA*RQ**S ANA*EQ**Q* Ce MTAILERRES ASLWARFCEW Cr **AI*ER**N SSL*AR**E* Nt **AI*ER**S ESL*GR**N*

30 40 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ITSTENRLYIIGWFGVLMIPT * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

60 70 80 90 An ' * * * * * ~ * * * * * * * * * * * A * S * M * * * * * * S * *VV*S*N*** Pr * * * * * ~ * * * * * * * * * * * A * S * M * * * * * * S * *VV*S*N*** Ce AFIAA~PVDI DGIREPVSGY LLYGNNIISG AVVPTSNAIG Cr * * * * * ~ * * * * * * * * * * * S * S * L * * * * * * T * * V I * T * N * * * Nt * * * * * ~ * * * * * * * * * * * S * S * L * * * * * * S * * I I * T * A * * *

I I 0 * L * * * ~* * * * . M * * * ~ * * * * SLDEWLYNGG . k * * * ~ * * * * ~ V ~ ~

An Pr Ce Cr Nt

120 130 140 **Q*V*F**L I * V F * * * * ~ * * * * * Y * * * * * **Q*V*F**L I * I F * * * * ~ * * * * * Y * * * * * PYQLIVCHFF LGICCYMG~E WELSFRLGMR * *Q* I *C* *L L*VY****~* * * * * F * * * * * * * E * I * L * * L L*VA***~[ * * * * * * F * * * * *

170

* * L * * * * * * * YPIGQGSFSD

160 An ' * * * * T * * * L I Pr * * * * T * * * L I Ce VAAATAVFII Cr * * * * S * * * L V Nt , * * * * T * * * L I

* * * * * I C * * V * * * * * I C * * I LLTATSVFII * * * * * S V * * I * * * * * S V * * I

I00

LHFYPIWEAA

150 ~ C ~ , ~ C ~ PWIAVAYSAP . . . A # # # . # . * * * A * * * * * *

180 190 200 * * * * * * * * * * * * * F * * * * * * ~ * * * * * * * * * * * * * * * * * * * * * * L * * * * * * * ~ * * * * * * * * GMPLGISGTF NFMIVFQAEH NIILMHPFHML

* * # # * * * * * * * * # I * * * * * * ~ i * * * * * * * * *

210 220 230 240 250 An " * * * * * * * * * * * * * * * * * * * * * ~ * V ~ * * * T * *Q*Y**K** * * * * * * * * * * Pr * * * * * * * * * * * * * * * * * * * * ***V* ]****N * *Q*Y**K** * * * * * * * * * * Ce GVAGVFGGSL FSAMHGSLVT SSLI~ETTEN ESRNAGYKFG QEEETYNIVA Cr * * * * * * * * * * * * * * ~ * * * * * * * * I~ I * * * *N * *A*E**R** * * * * * * * * * * Nt * * * * * * * * * * * * * * * * * * * * * * * I ~ * * * * N * *A*E**R** * * * * * * * * * *

260 270 280 290 An * * * * * * * * * * * * * * * * * * * ~ * * * * * * * * * * V * * * * * S * * I Pr * * * * * * * * * * * * * * * * * * * ~ * * * * * * * * * * V * * * * * S * * I Ce AHGYFGRLIF QYASFNNSR~ LHFFLAAWPV VGIWFTALGI Cr * * * * * * * * * * * * * * * * * * * ~ * ~ * * * * * * * * I * * * * * A * * L Nt * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * V * * * * * A * * I

300

STMAFNLNGF

310 320 330 340 350 An * * * * * * L * * * * * * I * * * * * V L * * * * * * * * * * * * * * * * * * * ****AGEATP Pr * * * * * * M * * * * * * I * * * * * I L * * * * * * * * * * * * * * * * * * * * * * *AVK~-- Ce NFNQSVVDSQ GRVINTWADI INRANLGMEV MHERNAHNFP LDLASVE--- Cr * * * * * * V * * * * * * L * * * * * I I * * * * * * * * * * * * * * * * * * * * * * * S T N - - - Nt * * * * * * V * * * * * * I * * * * * I I * * * * * * * * * * * * * * * * * * * * * * * A I E - - -

360 An VALTAPAING Pr . . . . APSIIG Ce APSIA- Cr . . . . SSSNN- Nt . . . . APSTNG

145

Fig. 4. Amino acid sequences of the photosystem II thylakoid protein D1 or the 32 kDa QB-binding protein (psbA). The putative transmembrane regions are boxed. The chlorophyll- and non-heine iron-binding residues of His are underlined. Stars indicate residues conserved among all five proteins. An, Anacystis nidulans (Golden et al. 1986); Pr, Proehlorothrix hollandia (Morden and Golden 1989); Ce, Chlorella ellipsoidea; Cr, Chlamydomonas reinhardii (Erickson et al. 1984); Nt, Nieotiana tabacum (Sugita and Sugiura 1984)

observations, it is possible that the IR structure, charac- teristic of cpDNAs, might have originated from a dupli- cation of the rRNA operon, with the duplicate arranged in an inverted array on the cpDNA. According to this idea, the ChloreIla-Chlamydomonas type of IR could be formed from the Pylaiella and Cryptomonas types by a mechanism of expansion/contraction as has been sug- gested to have operated in geranium cpDNA (Palmer 1985; Palmer et al. 1987). The direction of expansion in this example is from rrn to psbA (Fig. 5) and, interest- ingly, as shown in Fig. 5, is the same as that of the repli- cation of cpDNA from the origin located upstream of the

16S rRNA gene of Chlamydomonas. Althought the exact position of the replication origin is not known for the cpDNAs of Pylaiella, Cryptomonas and Chlorella, an ARS sequence on the Chlorella cpDNA has been mapped to a similar position in Chlamydomonas ori (Yamada et al. 1986). One mechanism that could account for the unidirectional expansion of the IR via replication and recombination is depicted in Fig. 6. Figure 6 A shows a replicating cpDNA molecule with a fork containing the IR and its flanking single copy regions. Double reciprocal recombination in this molecule, involving one point within the IR (Palmer 1983; Palmer et al. 1985) and the

Page 8: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

146

ARS 1

- - I

l sbA j C,

ori I

ori I

2~s Psbi~ 'iq Ceu

~s 2as FsbA '~,H Cm

Ck

~stls ~,A PI

Fig. 5. Comparison of the large inverted repeat sequences (IRs) among various kinds of cpDNAs. Abbreviation of genes: ARS, autonomously replicating sequence of yeast; ori, replication origin; 16S and 23S, 16S and 23S rRNAs; psbA, photosystem II thylakoid protein D1 or the 32 kDa Q~-binding protein; rbcL, large subunit of ribulose-1, 5-bisphosphate carboxylase. Abbreviation of organ- isms: Ce, Chlorella ellipsoidea; Cr, Chlamydomonas reinhardii (Erickson et al. 1984); Ceu, Chlamydomonas eugametos (Turmel et al. 5987); Cm, Chlamydomonas moewusii (Tnrmel et al. 5988); Cqb, Cryptomonas ~ (Douglas 1988); Pl, Pylaiella littoralis (Goer et al. 5988)

Fig. 6A-C. Postulated mechanism of expansion of the large invert- ed repeat sequence (IR) of cpDNA. Arrows along the circular cpDNA represent IRs. A a part of the cpDNA molecule is replicat- ing from ori(*). Double reciprocal recombination at one point with- in the IR and with the other in a repetitive sequence (triangle) on the single copy region will give a parental type molecule (B) and one with an extended IR (C) after the completion of replication

other in a repetitive sequence on the single copy region, would, after the completion of replication, give rise to two different structural molecules; a parental type (B) and one with an extended IR (C). This model requires the presence of repetitive sequences within and around the IR to serve as recombinational hot spots. Indeed, as de- scribed below such sequences were found on the cpDNA of C. ellipsoidea.

In order to extend the hypothesis of rrn as the origin of the IR to include higher plant cpDNAs, it would be interesting to look, among primitive algae and p r o c h l o r o -

phytes, for a postulated cpDNA with an IR consisting of only rrn and single copy regions, whose portions immedi- ately adjacent to the IR retain the higher plant gene order of the IR. Such a structure might represent a different line of evolution from that of Chlorella and Chlamydomonas.

Mechanism o f rearrangements o f the C. ellipsoidea IR

The IR region of C. ellipsoidea cpDNA includes various kinds of rearrangement involving insertions/deletions of small repeated sequences, possible transpositions (inser- tions) of ORFs with terminal repeated sequences (Ya- mada and Shimaji 1986a), and an inversion of a 5.0 kbp rrn region (Yamada and Shimji 1987 b). All of these rear- rangements seem to have occurred via small repetitive sequences (for example, e- and a-elements) in the genome (Yamada and Shimaji 1986a). Although ~- and a-ele- ments were first found as the terminal repeated sequences of transposon-like structures, the entire nucleotide se- quence of the Chlorella IR revealed that there are ten copies of c~ and ten copies of a within and around the IR. In addition to these, a pair of inverted repeat sequences of 185 bp (/?-elements) have been found within the IR in this study. A region of about 1.0 kbp (Fig. 2 d, positions 354-1 319), between psbA and trnS, is interposed be- tween the/?-elements.

According to the IR expansion/contraction model of Fig. 6, there should be repetitive sequences in the vicinity of the endpoints of the IR. The 125 bp B sequence on the IR, flanked by both l% and a-elements (Figs. 2 d and 3), is a good candidate for such sequences, since B is also located adjacent to the endpoint of the IR (Fig. 2a). Be- tween the tandemly repeated B-sequences, there is a run of 2273 bp which contains the entire psbA gene (Fig. 3). This structure suggests that an expansion of the IR may have occurred, via B-sequences, to include psbA. In a similar way, the sequence of 597 bp next to psbA (posi- tions ~ 224-1 821), including trnS (GCU), is sandwiched by the tandemly repeated a-linked sequences of 73 bp (G) as well as the tandem D-sequences (85 bp) shown in Fig. 2 d, again suggesting an expansion of this region by the same mechanism. If so, the event involving trnS (GCU) must be older than that involvingpsbA, and both of them may have occurred serially in the same direction, possibly by D N A replication (Fig. 6). Since multi-copies of ~r- and fl-elements were detected by Southern hy- bridization in the single copy regions of C. ellipsoidea cpDNA (data not shown), a transpositional mechanism could account for the dispersed distribution.

In contrast with higher plants, algae (especially uni- cellular ones, such as Chlorella) seem to have much more freedom to rearrange their chloroplast genomes. The pe- culiar structure of the IR of C. ellipsoidea cpDNA pro- vides an example, showing dynamic rearrangements me- diated by multiple copies of the unique element.

Acknowledgements. The author wishes to thank Dr. M. Sugiura for plasmid pTB28 containing the tobacco psbA gene, Dr. R. Crouch for critical reading of the manuscript, Miss E Ozawa for synthetic oligonucleotides, and Miss R. Matsuura for typing the manuscript.

Page 9: Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat

147

References

Calie PJ, Hughes KW (1987) Mol Gen Genet 208:335-341 Cattolico RA (1986) Trend Ecol Evol 1:64-67 Davis LG, Dibner MD, Battey JF (1986) Basic methods in molecu-

lar biology. Elsevier, New York, pp 80-229 Douglas SE (1988) Curr Genet 14:591-598 Erickson JM, Rahire M, Rochaix JD (1984) EMBO J 3:2753-2762 Gingrich JC, Buzby JS, Stirewalt VL, Bryant DA (1988) Photosyn

Res 16:83-99 G6er SL, Markowicz Y, Dalmon J, Audren H (1988) Curr Genet

14:155-162 Golden SS, Brusslan J, Haselkorn R (1986) EMBO J 5:2789-2798 Harris EH, Boynton JE, Gillham NW (1987) In: O'Brien SJ (ed)

Genetic Maps. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, pp 257-277

Hedberg MF, Huang YS, Hommersand MH (1981) Science 213:445 -447

Horrander J, Kempe T, Messing J (1983) Gene 26:101-106 Kuhsel M, Kowallik KV (1987) Mol Gen Genet 207:361-368 Li N, Cattolico RA (1987) Mol Gen Genet 209:343-351 Manhart JR, Kelly K, Dudock BS, Palmer JD (1989) Mol Gen

Genet 216:417-421 Maniatis T, Fritsch EF, Sambrook J (1982) Molecular cloning: a

laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, pp 363-402

Morden CW, Golden SS (1989) Nature 337:382-385 Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S,

Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi H, Ozeki H (1986) Nature: 572-574

Padmanabhan U, Green BR (1978) Biochim Biophys Acta 521:67- 73

Palmer JD (1983) Nature 301:92-93 Palmer JD (1985) Annu Rev Genet 19:325-354

Palmer JD, Stein DB (1986) Curr Genet 10:823-833 Palmer JD, Boynton JE, Gillham NW, Harris EH (1985) In:

Arntzen C, Bogorad, L, Bonitz S, Steinback KS (eds) Molecu- lar biology of the photosynthetic apparatus. Cold Spring Har- bor laboratory, Cold Spring Harbor, New York, pp 269-278

Palmer JD, Nugent JM, Herbon LA (1987) Proc Natl Acad Sci USA 84:769-773

Reith M, Cattolico RA (1986) Proc Natl Acad Sci USA 83: 8599- 8603

Rochaix JD, Erickson J (1988) Trend Biochem Sci 13:56-59 Sanger F, Nicklen S, Coulson AR (1977) Proc Natl Acad Sci USA

74:5463-5467 Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N,

Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Ya- maguchi-Shinozaki K, Ohto C, Torazawa K, Meng BY, Sugita M, Deno H, Kamogashira T, Yamada K, Kusuda J, Takaiwa F, Kato A, Shimada H, Sugiura M (1986) EMBO J 5:2043-2049

Sugita M, Sugiura M (1984) Mol Gen Genet 195:308-313 Turmel M, Bellemare G, Lemieux C (1987) Curr Genet 11:543- 552 Turmel M, Lemieux B, Lemieux C (1988) Mol Gen Genet 214: 412-

419 Yamada T (1982) Plant Physiol 70:92-96 Yamada T (1983) Curr Genet 7:481-487 Yamada T (1988) Nucleic Acids Res 16:9865 Yamada T (1989) Nucleic Acids Res 17:4372 Yamada T, Shimaji M (1986a) Nucleic Acids Res 14:3827-3839 Yamada T, Shimaji M (1986b) Nucleic Acids Res 14:9529 Yamada T, Shimaji M (1987a) Curr Genet 11:347-352 Yamada T, Shim@ M (1987b) Mol Gen Genet 208:377-383 Yamada T, Shimaji M, Fukuda Y (1986) Plant Mot Biol 6:245-252 Yoshinaga K, Ohta T, Suzuki Y, Sugiura M (1988) Plant Mol Biol

10:245-250

C o m m u n i c a t e d by C. S. Levings I I I