Signature of Diversifying Selection on Members of the

18
Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.109.104778 Signature of Diversifying Selection on Members of the Pentatricopeptide Repeat Protein Family in Arabidopsis lyrata John Paul Foxe* and Stephen I. Wright †,1 *Department of Biology, York University, Toronto, Ontario M3J 1P3, Canada and Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario M5S 3B2, Canada Manuscript received May 8, 2009 Accepted for publication July 17, 2009 ABSTRACT Pentatricopeptide repeat (PPR) proteins compose a family of nuclear-encoded transcriptional regulators of cytoplasmic genes. They have shown dramatic expansion in copy number in plants, and although the functional importance of many remains unclear, a subset has been repeatedly implicated as nuclear restorers for cytoplasmic male sterility. Here we investigate the molecular population genetics and molecular evolution of seven single-copy PPR genes in the outcrossing model plant Arabidopsis lyrata. In comparison with neutral reference loci, we find, on average, elevated levels of polymorphism and an excess of high-frequency variants at these PPR genes, suggesting that natural selection is maintaining polymorphism at some of these loci. This elevation in diversity persists when we control for divergence and generally decreases in the flanking regions, suggesting that these genes are themselves the targets of selection. Some of the PPR genes also demonstrate elevated population differentiation, which is consistent with spatially varying selection. In contrast, no comparable patterns are observed at these loci in A. thaliana, providing no evidence for the action of balancing selection in this selfing species. Taken together, these results suggest that a subset of PPR genes may be subject to balancing selection associated with ongoing cytonuclear coevolution in the outcrossing A. lyrata, which is possibly mediated either by intergenomic conflict or by compensatory evolution. D ESPITE the tight and ancient mutualism between cytoplasmic and nuclear genomes, conflicts of interest can repeatedly arise due to differences in modes of inheritance (Rand et al. 2004; Burt and Trivers 2006). For cytoplasmic genomes that experi- ence maternal inheritance, mutations that enhance female fertility, even at a net cost to an individual’s survival and total reproduction, will spread via natural selection since they enhance cytoplasmic transmission (Gouyon and Couvet 1987; Frank 1989; Budar et al. 2003). One of the best lines of evidence for this type of cytonuclear conflict is found in the widespread phenomenon of cytoplasmic male sterility (CMS) in plants (Frank 1989; Schnable and Wise 1998; Budar et al. 2003), where male sterility encoded in a cytoplasmic gene leads to the spread of females in hermaphroditic plant populations. In most cases, CMS can be suppressed by nuclear-encoded restorer alleles (Bentolila et al. 2002; Brown et al. 2003; Desloire et al. 2003; Kazama and Toriyama 2003; Koizuka et al. 2003), which are selectively favored as pollen becomes limited in populations with high frequencies of CMS. CMS has been documented in .150 plants and can persist as a reproductive polymorphism in natural populations (gynodioecy) or can appear following in- terspecific hybridization (Schnable and Wise 1998). The molecular basis for CMS in a number of systems has been well documented and seems to involve the mito- chondrial genome in all cases. Expression of chimeric mitochondrial open reading frames that interfere with normal mitochondrial function and pollen develop- ment lead to the male sterile phenotype (Schnable and Wise 1998). Unexpectedly, nuclear restorer genes of CMS cloned from numerous divergent plant species appear to arise almost universally from the pentatrico- peptide repeat (PPR) protein family (Bentolila et al. 2002; Brown et al. 2003; Desloire et al. 2003; Kazama and Toriyama 2003; Koizuka et al. 2003), which is defined by the presence of a degenerate 35-amino-acid motif. PPR genes exist in high copy numbers in plant genomes and are generally thought to act as gene- specific transcriptional regulators of cytoplasmic genes (Small and Peeters 2000; Lurin et al. 2004; Nakamura et al. 2004; Kotera et al. 2005). Although a growing number of PPR genes have been characterized (Schmitz-Linneweber and Small 2008), the func- tional importance of many of them remains unclear, although most, if not all, are targeted to the chloroplast Supporting information is available online at http://www.genetics.org/ cgi/content/full/genetics.109.104778/DC1. Sequence data from this article have been deposited with the EMBL/ GenBank Data Libraries under accession nos. GQ343552-GQ344403. 1 Corresponding author: Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks St., Toronto, ON M5S 3B2, Canada. E-mail: [email protected] Genetics 183: 663–672 (October 2009) Downloaded from https://academic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Transcript of Signature of Diversifying Selection on Members of the

Page 1: Signature of Diversifying Selection on Members of the

Copyright � 2009 by the Genetics Society of AmericaDOI: 10.1534/genetics.109.104778

Signature of Diversifying Selection on Members of the PentatricopeptideRepeat Protein Family in Arabidopsis lyrata

John Paul Foxe* and Stephen I. Wright†,1

*Department of Biology, York University, Toronto, Ontario M3J 1P3, Canada and †Department of Ecology and EvolutionaryBiology, University of Toronto, Toronto, Ontario M5S 3B2, Canada

Manuscript received May 8, 2009Accepted for publication July 17, 2009

ABSTRACT

Pentatricopeptide repeat (PPR) proteins compose a family of nuclear-encoded transcriptionalregulators of cytoplasmic genes. They have shown dramatic expansion in copy number in plants, andalthough the functional importance of many remains unclear, a subset has been repeatedly implicated asnuclear restorers for cytoplasmic male sterility. Here we investigate the molecular population genetics andmolecular evolution of seven single-copy PPR genes in the outcrossing model plant Arabidopsis lyrata. Incomparison with neutral reference loci, we find, on average, elevated levels of polymorphism and anexcess of high-frequency variants at these PPR genes, suggesting that natural selection is maintainingpolymorphism at some of these loci. This elevation in diversity persists when we control for divergenceand generally decreases in the flanking regions, suggesting that these genes are themselves the targets ofselection. Some of the PPR genes also demonstrate elevated population differentiation, which isconsistent with spatially varying selection. In contrast, no comparable patterns are observed at these loci inA. thaliana, providing no evidence for the action of balancing selection in this selfing species. Takentogether, these results suggest that a subset of PPR genes may be subject to balancing selection associatedwith ongoing cytonuclear coevolution in the outcrossing A. lyrata, which is possibly mediated either byintergenomic conflict or by compensatory evolution.

DESPITE the tight and ancient mutualism betweencytoplasmic and nuclear genomes, conflicts of

interest can repeatedly arise due to differences inmodes of inheritance (Rand et al. 2004; Burt andTrivers 2006). For cytoplasmic genomes that experi-ence maternal inheritance, mutations that enhancefemale fertility, even at a net cost to an individual’ssurvival and total reproduction, will spread via naturalselection since they enhance cytoplasmic transmission(Gouyon and Couvet 1987; Frank 1989; Budar et al.2003). One of the best lines of evidence for this typeof cytonuclear conflict is found in the widespreadphenomenon of cytoplasmic male sterility (CMS) inplants (Frank 1989; Schnable and Wise 1998; Budar

et al. 2003), where male sterility encoded in acytoplasmic gene leads to the spread of females inhermaphroditic plant populations. In most cases, CMScan be suppressed by nuclear-encoded restorer alleles(Bentolila et al. 2002; Brown et al. 2003; Desloire

et al. 2003; Kazama and Toriyama 2003; Koizuka et al.

2003), which are selectively favored as pollen becomeslimited in populations with high frequencies of CMS.

CMS has been documented in .150 plants and canpersist as a reproductive polymorphism in naturalpopulations (gynodioecy) or can appear following in-terspecific hybridization (Schnable and Wise 1998).The molecular basis for CMS in a number of systems hasbeen well documented and seems to involve the mito-chondrial genome in all cases. Expression of chimericmitochondrial open reading frames that interfere withnormal mitochondrial function and pollen develop-ment lead to the male sterile phenotype (Schnable andWise 1998). Unexpectedly, nuclear restorer genes ofCMS cloned from numerous divergent plant speciesappear to arise almost universally from the pentatrico-peptide repeat (PPR) protein family (Bentolila et al.2002; Brown et al. 2003; Desloire et al. 2003; Kazama

and Toriyama 2003; Koizuka et al. 2003), which isdefined by the presence of a degenerate 35-amino-acidmotif. PPR genes exist in high copy numbers in plantgenomes and are generally thought to act as gene-specific transcriptional regulators of cytoplasmic genes(Small and Peeters 2000; Lurin et al. 2004; Nakamura

et al. 2004; Kotera et al. 2005). Although a growingnumber of PPR genes have been characterized(Schmitz-Linneweber and Small 2008), the func-tional importance of many of them remains unclear,although most, if not all, are targeted to the chloroplast

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.104778/DC1.

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. GQ343552-GQ344403.

1Corresponding author: Department of Ecology and Evolutionary Biology,University of Toronto, 25 Willcocks St., Toronto, ON M5S 3B2,Canada. E-mail: [email protected]

Genetics 183: 663–672 (October 2009)

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 2: Signature of Diversifying Selection on Members of the

or mitochondria (Lurin et al. 2004). The molecularaction of PPR genes varies and includes RNA editing,stability, cleavage, and splicing (Schmitz-Linneweber

and Small 2008). Recent studies suggest that theexpansion of PPR genes in flowering plants may bedue to several waves of retrotransposition (O’Toole

et al. 2008) and that a subset of PPR locations across thegenome may be highly dynamic (Geddy and Brown

2007).The evolutionary dynamics of CMS have been well

documented in gynodioecious plant populations, whichmaintain both females and hermaphrodites withinpopulations (Frank 1989; Hanson 1991; Charlesworth

and Laporte 1998; Budar et al. 2003; Wright et al.2008b). However, any outcrossing hermaphrodite plantpopulation is susceptible to invasion by cytoplasmicmutants that increase female fertility by causing malesterility. Indeed, many instances of CMS have emergedfrom wide intraspecific or interspecific crosses, sug-gesting the exposure of a hidden evolutionary historyof CMS and the fixation of nuclear restorers (Fishman

and Willis 2006; Case and Willis 2008). Furthermore,it is possible that cryptic polymorphisms exist forcytoplasmic alleles that have more subtle effects onmale fertility with nuclear-encoded suppressors of thisactivity. In contrast, we would not expect to find anyevidence for cytonuclear conflicts in highly inbreedingspecies since pollen sterility mutations will reducefemale fertility as well (Fishman and Willis 2006;Wright et al. 2008b).

A recent study of genomewide patterns of DNAsequence variation at 77 loci in the outcrossing modelplant Arabidopsis lyrata provided preliminary evidencethat a number of PPR genes show unusual diversitypatterns consistent with the action of diversifying selec-tion (Ross-Ibarra et al. 2008). In particular, the PPRgenes included in the study generally exhibited highsilent-nucleotide diversity, as well as high levels ofbetween-population differentiation. One PPR locus,At1G74600, also exhibited significantly elevated levelsof range-wide differentiation when applying a demo-graphic model fit to the data, suggesting the action ofdisruptive selection and/or local adaptation on thislocus. Interestingly, this locus, as well as several otherPPR loci from this sample, share detectable amino acidsimilarity with CMS fertility restorers in rice (see below).If there is diversifying selection acting on some PPRgenes in A. lyrata, this may reflect ongoing evolutionarydynamics associated with cytonuclear conflict. To inves-tigate this possibility further, we performed maximum-likelihood and further coalescent and permutation-basedtests of selection on this complement of seven single-copy PPR genes and surveyed variation at flankingregions to test whether the PPR locus is the direct targetof selection. In addition, we surveyed polymorphism atthe same regions in the highly selfing A. thaliana. Ourprediction was that, if selection associated with cytonu-

clear conflict drives the unusual diversity patterns, thenwe would not see comparable evidence for selection inflanking regions or in a related selfing species.

MATERIALS AND METHODS

Targeted PPR loci: We made use of a polymorphism data setfrom coding regions of 7 PPR genes and 50 ‘‘reference’’ genesfrom six populations of A. lyrata (Ross-Ibarra et al. 2008). Togenerate this sample of 57 loci, disease resistance genes, whichhave been hypothesized to be under pervasive selection, wereremoved for all analyses. The PPR genes studied includedAt1g03560, At1g59720, At1g74600, At2g28050, At2g36980,At3g62890, and At4g14190, with locus names based ondesignations from the A. thaliana genome project (Arabidop-

sis Genome Initiative 2000). These loci include representa-tives of all three major subclasses of PPR genes as identified byLurin et al. (2004), including the P class (At1g03560,At2g28050, At4g14190), the E/E1 subclass (At1g74600 andAt2g36980), and the DYW subclass (At1g59720 andAt3g62890). Of these loci, BLASTx searches of the translatednucleotide sequences to the protein database (Altschul et al.1997) reveal that four PPR genes (At1g03560, At1g74600,At2g28050, and At4g14190) show significant sequence ho-mology to fertility restorers cloned in rice (Kazama andToriyama 2003; Komori et al. 2004; Wang et al. 2006),although, in terms of motif arrangements, At1g74600 is notfrom the P class, which is the most common motif arrange-ment found in the vast majority of fertility restorers (Schmitz-Linneweber and Small 2008). Recently, At1g59720 wasfunctionally characterized in A. thaliana as playing a role inRNA editing at multiple sites in the chloroplast (Okuda et al.2009).

For this study, these seven loci were also sequenced in 48accessions of A. thaliana (details below). Additionally, toexpand our regional surveys of diversity around our targetPPR fragments in A. lyrata, 7 primer pairs were designed fromadditional fragments within these genes, as well as 20 primerpairs in flanking genes (supporting information, Table S1).Flanking regions were chosen to be as physically adjacent aspossible to characterize the heterogeneity in polymorphismacross the focal region. All amplified regions were situatedwithin exons and range in size from 200 to 700 bp in length.Three of these flanking loci (At1g74580, At1g74630, andAt4g14170) were also found to be PPR loci. Primers for each ofthese flanking genes were designed using the A. thalianagenome (Arabidopsis Genome Initiative 2000). With thesubsequent release of the A. lyrata genome assembly (http://genome.jgipsf.org/Araly1/Araly1.home.html), we were ableto use BLAST searches to confirm the primer positionsflanking our targeted PPR loci in this species and to calculatephysical positions within A. lyrata.

Population samples: A. lyrata samples consisted of 65individuals, originating from the six natural populationspreviously studied by Ross-Ibarra et al. (2008). The plantsfrom which sequences were obtained include 6 individualsfrom Karhumaki, Russia (from O. Savolainen), 8 individualsfrom Stubbsand, Sweden (O. Savolainen), 15 individuals fromPlech, Germany (T. Mitchell-Olds), 12 individuals from EsjaMountain, Iceland (E. Thorhallsdottir), 12 individuals fromIndiana, United States (B. Mable), and 12 individuals fromRondeau Provincial Park, Ontario, Canada (B. Mable and S.Wright). The 48 A. thaliana individuals used in this studyrepresent a subset of those used by Nordborg et al. (2005)(Table S2). With the exception of 2 populations, 2 individualsfrom each of 24 populations, representing a sampling across

664 J. P. Foxe and S. I. WrightD

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 3: Signature of Diversifying Selection on Members of the

Europe and the midwestern United States, were used (TableS2). Following Nordberg et al. (2005), these 24 populationswere then further categorized according to their broadergeographic origin, including north Sweden, south Sweden,Central Europe, Europe, England, and the midwestern UnitedStates. DNA from a single plant from each maternal family wasextracted using either the DNeasy plant DNA extraction kit(Qiagen, Valencia, CA) or the FastDNA plant DNA extractionkit (Qbiogene, CA). Extracted A. thaliana DNA was kindlyprovided by R. Gaut (University of California at Irvine).

PCR and sequencing: We employed a direct sequencingstrategy, using single large exons for PCR amplification andsequencing to minimize the chance of unreadable sequencescaused by insertion/deletion variants as described in Ross-Ibarra et al. (2008). Each exon was submitted to a BLASTsearch of the genomic survey sequence database (http://www.ncbi.nlm.nih.gov/BLAST/) to check for the presenceof orthologous regions in the shotgun genome sequence ofBrassica oleracea (Altschul et al. 1997). These orthologousregions were aligned to the A. thaliana genomic sequence toidentify conserved regions for primer design. PCR primerswere designed with the aid of PrimerQuest (Integrated DNATechnologies; http://biotools.idtdna.com/primerquest/) toamplify 650- to 750-bp fragments for sequencing using thesame forward and reverse primers. Primers were also sub-mitted to a BLAST search against the A. thaliana genome toensure amplification of a single-copy region.

PCR reactions were carried out in 25-ml volumes on anEppendorf Mastercycler. The cycles were as follows: 2 min at94�, 20 sec at 94�, 20 sec at 55�, and 40 sec at 72� for 35 cycles,with a final extension time of 4 min at 72�.

Sequencing reactions were carried out by Lark Technolo-gies (Houston, TX). Chromatograms were analyzed usingSequencher 4.6, using the ‘‘call secondary peaks’’ option to aidin the identification of heterozygous sites. All chromatogramswere checked manually for heterozygous nucleotide positions,using the sequence from both strands to confirm putativeheterozygous sites. Nucleotide sequences have been depositedin GenBank (accession nos. GQ343552–GQ344403).

Gene family exclusion: BLASTsearches of our fragments tothe A. lyrata genome assembly (http://genome.jgipsf.org/Araly1/Araly1.home.html) confirmed that our PPR genesand flanking regions were single copy, meaning that nucleo-tide divergence was sufficiently high with other gene familymembers that cross-amplification was highly unlikely. Further-more, none of the PPR genes showed a pattern of fixedheterozygosity (i.e., all samples showed a heterozygous base)from direct sequencing, suggesting that we had successfullyamplified single-copy loci. In addition, each of the seven PPRgenes in A. lyrata in each of two heterozygous individuals werecloned to again ensure the absence of a gene family. PCR wasperformed on each PPR gene in two different individuals, andthe product was cleaned using a Qiagen QIAquick PCRpurification kit. The cleaned PCR product was visualized ona 1% agarose gel, and a ligation reaction was carried out usingan Invitrogen TA cloning kit. Transformation reactions wereperformed using Invitrogen One Shot Top Ten competentEscherichia coli cells. The cell cultures were grown on ampicil-lin-resistant X-GAL-stained agarose plates. White colonieswere screened using a colony PCR technique, and 8–10positive clones for each individual were sequenced by LarkTechnologies. Analysis of the generated sequence revealedsegregation of two haplotypes, while any deviations may beaccounted for by PCR recombination.

Finally, results from segregation analysis in the mappingpopulation (Hansson et al. 2006) confirmed that we wereamplifying single genomic regions, and not gene duplicates,for the following loci: At1g03560, At1g59720, and At1g74600.

From the combination of these results, we are reasonablyconfident that we have successfully amplified only single-copyloci.

Data analysis: Synonymous and nonsynonymous sites wereidentified by aligning each fragment to the correspondingfragment in the A. thaliana genome sequence, identified us-ing BLAST (Altschul et al. 1997), and by using the proteinannotation from A. thaliana. Standard population geneticanalyses of the sequence data were carried out using both theprogram DNAsp Version 4.0 (Rozas et al. 2003) and amodified version of Polymorphurama, Perl script, written by(Bachtrog and Andolfatto 2006). Significant departures atindividual PPR genes were assessed by comparing the averagepairwise differences (p) (Tajima 1989) and Tajima’s D(Tajima 1989) at both synonymous and nonsynonymous sitesfor each locus to the empirical null distribution for the 50reference genes. In addition, permutation tests for meansummary statistics across PPR genes were conducted byresampling 7 loci from the full 57-gene data set, includingPPR loci and reference genes from A. lyrata. Using 100,000permutations, we calculated the proportion of times the meanvalue from 7 permuted loci was as high or higher than theobserved mean from the PPR genes. In A. thaliana, permuta-tion tests for summary statistics were conducted by resampling7 loci from a 116-locus reference gene data set, which wasgenerated by using a subsample of published data fromNordborg et al. (2005). These subsampled loci were selectedto have a minimum of 80 synonymous sites for polymorphismanalysis. For each of these 116 loci, polymorphism data fromcoding regions were analyzed from the same set of 48individuals used in our PPR resequencing study.

Levels of differentiation, as measured using Wright’s(Wright 1969) FST, were calculated using estimates of theaverage pairwise synonymous differences, and psyn was calcu-lated using Perl script for each of the six A. lyrata populationsand for the total A. lyrata data set. Loci for which we had data inat least 10 chromosomes in each population were included inthe estimates. In A. thaliana, each of the populations (asdefined by Nordborg et al. 2005) consisted of six to eightindividuals, and psyn was calculated using DNAsp Version 4.0(Rozas et al. 2003). To estimate global differentiation, wecalculated FST using the formula of Hudson et al. (1992). Inaddition, FST values for each pair of populations werecalculated using Perl script written by J. Ross-Ibarra (Universityof California, Davis, CA).

A maximum-likelihood HKA (mlHKA) test was performedusing the mlHKA program (Wright and Charlesworth

2004). This model is based on the HKA test that evaluates thefit of polymorphism and divergence to expectations under theneutral theory (Hudson et al. 1987). Under the neutral theory,within-species diversity should correlate with between-speciesdivergence (Kimura 1983). The HKA test utilizes polymor-phism data from within a single species and sequence di-vergence data from a related species to compare the relativeamounts of polymorphism and divergence across multipleloci. The mlHKA test allows for a test of selection at individualloci or for a class of genes in a similar multilocus framework.Between-species divergence was calculated using Jukes–Can-tor-corrected synonymous divergence from A. thaliana. Werestricted this analysis to those loci for which we had sequenceinformation for individuals from each of the six populations inA. lyrata. The program was run under a strictly neutral modelfor a total of 2 million chains, followed by a selection model inwhich the 7 PPR genes were designated candidate genes forthe action of selection, again for a total of 2 million chains.Significance was assessed using the likelihood-ratio test wheretwice the difference in log likelihood between the models isapproximately chi-squared distributed with 7 d.f., corresponding

Pentatricopeptide Repeat Protein Family 665D

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 4: Signature of Diversifying Selection on Members of the

to the difference in the number of parameters between theneutral model and the selection model. Note that, if only asubset of the PPR genes is under selection, this test, whichassumes the independent action of selection on each PPRlocus, should be conservative.

Although mlHKA tests for unusual patterns of diversitywhile controlling for divergence, the demographic history ofA. lyrata may have inflated the variance in diversity beyond thatassumed under the standard neutral model. To test forelevated diversity at PPR genes while controlling for bothdivergence and demography, we therefore employed coales-cent simulations under the demographic model for thesepopulations as inferred by Ross-Ibarra et al. (2008). Briefly,this model assumes an ancestral population that splits into sixdaughter populations, each of which experiences a bottle-neck, followed by a recovery to current population sizes. Wesimulated this demographic history using point estimates ofeach parameter in the model (see Table 1 of Ross-Ibarra et al.2008). Each locus was simulated using this model, but thepopulation mutation parameter was rescaled for each gene bya factor Ksi

/Ksaverage, where Ksi

is the per-site synonymousdivergence between A. lyrata and A. thaliana at locus i andKsaverage

is the average synonymous divergence across loci. In thisway, each locus has a distinct mutation rate estimated by theamount of between-species divergence. Allowing for locus-specific mutation rates provides a more conservative test forselection, since the variance across loci in diversity can beexplained by mutation rather than by selection. We ransimulations 10,000 times for each of the 53 loci used inthe mlHKA analysis using the program msstats, which reads indata from the coalescent simulation program ms and calcu-lates several common summary statistics (Hudson 2002;Thornton 2003). For each simulated data set, we calculatedp and Tajima’s D values. We then calculated the average p andTajima’s D values at the seven simulated PPR genes andcompared it with the global multilocus average. These ratioswere calculated for our observed data at synonymous sites, andsignificance was assessed by comparing to the simulateddistribution. In addition, observed pSyn and Tajima’s Dsynonymous values of each individual PPR locus were com-pared with the simulated distribution for that locus to assesssignificance for individual loci.

RESULTS

Levels of diversity and differentiation at PPR genesin A. lyrata and A. thaliana: Levels of diversity anddifferentiation were calculated for both PPR genes andgenomewide using 57 (50 reference and 7 PPR) nuclear-encoded loci in A. lyrata (as described by Ross-Ibarra

et al. 2008) and 116 nuclear-encoded loci in A. thaliana(as described by Nordborg et al. 2005). Each of the 7PPR genes in this study was resequenced in 48 individ-uals in A. thaliana, representing a subset of thosepopulation samples used by Nordborg et al. (2005) toallow for direct comparison of the levels of diversity anddifferentiation in PPR genes in these two differentspecies (Table S2).

Table 1 shows synonymous and nonsynonymousnucleotide diversity for the seven original PPR genefragments and the genomewide patterns for both A.lyrata and A. thaliana. In A. lyrata, both synonymous andnonsynonymous diversity levels were significantly ele-

TA

BL

E1

Lev

els

of

div

ersi

tyan

dd

iffe

ren

tiat

ion

atP

PR

gen

esan

dge

no

me

aver

ages

inA

.ly

rata

and

A.

tha

lia

na

psy

na

pn

on

syn

Taj

ima’

sD

syn

bT

ajim

a’s

Dn

on

syn

F ST

syn

cF S

Tn

on

syn

Lo

cus

A.

lyra

taA

.th

alia

na

A.

lyra

taA

.th

alia

na

A.

lyra

taA

.th

alia

na

A.

lyra

taA

.th

alia

na

A.

lyra

taA

.th

alia

na

A.

lyra

taA

.th

alia

na

At1

g035

600

.05

7**

*,

0.00

10.

009

0.00

12

.94

6*

NA

2.11

30

.07

60.

454

0.00

00.

530

0.3

13

At1

g597

200

.08

5**

*0.

012

0.0

22

0.0

15

0.23

41.

156

0.56

12

.40

50.

396

�0.

011

0.63

10.

006

At1

g746

000

.09

0**

*,

0.00

10

.01

20.

001

3.7

61

***

NA

3.7

73

�1.

075

0.69

70.

000

0.77

90.

162

At2

g280

500

.06

2**

*0.

002

0.0

13

,0.

001

1.28

7�

1.76

40.

182

�1.

868

0.59

20.

325

0.60

00

.75

0A

t2g3

6980

0.01

90.

005

0.00

80.

001

0.96

4�

0.24

62

.97

5�

1.00

50.

442

0.32

30.

511

0.2

37

At3

g628

900

.05

7**

*,

0.00

10

.01

4,

0.00

11.

751

�1.

111

2.6

87

NA

0.61

70.

032

0.73

9N

AA

t4g1

4190

0.0

69

***

0.00

10.

008

0.00

21.

803

�1.

576

�0.

199

�0.

577

0.42

3�

0.06

80.

521

0.06

7P

PR

aver

age

0.0

63

*0.

003

0.0

12

0.0

03

1.8

21

***

�0.

708

1.7

27

�0.

341

0.5

17

0.08

60.

616

0.25

6P

erm

ute

dm

ean

0.01

90.

006

0.00

50.

002

0.44

2�

0.44

4�

0.08

2�

0.76

20.

399

0.35

50.

488

0.00

1

Est

imat

eso

fsy

no

nym

ou

san

dn

on

syn

on

ymo

us

nu

cleo

tid

ed

iver

sity

,fre

qu

ency

of

vari

ants

and

pai

rwis

ep

op

ula

tio

nd

iffe

ren

tiat

ion

atP

PR

gen

efr

agm

ents

,an

dge

no

me

aver

-ag

esin

A.

lyra

taan

dA

.th

alia

na.

Stat

isti

call

ysi

gnifi

can

tva

lues

:p

,0.

05u

sin

gp

erm

uta

tio

nte

sts

isin

dic

ated

by

bo

ldfa

cety

pe;

***,

p,

0.00

1u

sin

gco

ales

cen

tsi

mu

lati

on

sco

nd

uct

edo

nly

for

syn

on

ymo

us

site

sin

A.

lyra

ta;

*,p

,0.

05u

sin

gco

ales

cen

tsi

mu

lati

on

sco

nd

uct

edo

nly

for

syn

on

ymo

us

site

sin

A.

lyra

ta.

aSy

no

nym

ou

san

dn

on

syn

on

ymo

us

nu

cleo

tid

ed

iver

sity

asm

easu

red

by

psy

nan

dp

no

nsy

n,

wh

ere

pis

the

aver

age

nu

mb

ero

fp

airw

ise

dif

fere

nce

sb

etw

een

two

ind

ivid

ual

s.bF

req

uen

cyo

fva

rian

tsin

each

of

the

PP

Rge

nes

and

gen

om

eav

erag

esin

A.

lyra

taan

dA

.th

alia

na

asm

easu

red

by

calc

ula

tin

gT

ajim

a’s

Dsy

no

nym

ou

san

dn

on

syn

on

ymo

us.

cSy

no

nym

ou

san

dn

on

syn

on

ymo

us

pai

rwis

ep

op

ula

tio

nd

iffe

ren

tiat

ion

esti

mat

esas

mea

sure

db

yth

ep

op

ula

tio

nd

iffe

ren

tiat

ion

par

amet

erF S

Tca

lcu

late

du

sin

gp

syn

and

pn

on

syn.

666 J. P. Foxe and S. I. WrightD

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 5: Signature of Diversifying Selection on Members of the

vated at PPR genes in comparison with the genomicdistribution. The mean value of both psyn and pnonsyn atPPR genes showed a significant elevation compared to apermuted mean from seven random loci from the entirepolymorphism data set (P , 0.001). Individually, six ofthe seven PPR genes showed significantly elevated psyn

values when compared to the genomewide distribution,including all four loci that share sequence homologywith rice fertility restorers (At1g03560, At1g74600,At2g28050, and At4g14190). In contrast, there was nosignificant increase in the levels of synonymous diversityat PPR genes in A. thaliana. However, pnonsyn at PPRgenes showed a significant elevation compared to apermuted mean from seven random loci in the A.thaliana reference polymorphism data set (P , 0.05).

Synonymous and nonsynonymous nucleotide diver-sity for the 7 PPR gene fragments and the genomewidepatterns for A. lyrata were also calculated for each of thesix A. lyrata populations used in this study (Table S3).The mean value of both psyn and pnonsyn at PPR genesshowed a significant elevation compared to the genomeaverage in Iceland, Germany, and Sweden. pnonsyn atPPR genes also showed a significant elevation comparedto the genome average in Russia. When examiningindividual genes, 12 of 42 (29%) gene-populationcombinations showed elevated synonymous diversity,while none showed signicantly reduced diversity withinpopulations (Table S3). One caveat with this latter resultis that low within-population diversity is common in thereference genes (Ross-Ibarra et al. 2008), providinglittle power to detect unusually reduced polymorphism.In addition, 11 of 42 (26%) gene-population compar-isons showed elevated nonsynonyous diversity (Table S3).

We investigated the site frequency spectrum in eachof the PPR genes by calculating Tajima’s D at synony-mous and nonsynonymous sites (listed in Table 1). Boththe mean synonymous and nonsynonymous Tajima’s Dfor PPR genes were significantly elevated compared withthe permuted mean of the entire multilocus data set inA. lyrata (P , 0.001). In addition, At1g03560 andAt1g74600 showed individually significant Tajima’s Dvalues in comparison with reference genes (Table 1).Within populations, Tajima’s D was less often significant(Table S3); only 2 of 42 (5%) gene-population compar-isons were significant at synonymous sites, and the meanTajima’s D was not significantly elevated within individ-ual populations. However, the average nonsynonymousTajima’s D was significantly elevated in Germany. Noelevation of Tajima’s D was observed in A. thaliana.

Levels of population differentiation at synonymousand nonsynonymous sites, as measured using FST, areshown in Table 1. Although no individual locus showedelevated FSTsyn

compared with the empirical null distri-bution, the mean FSTsyn

for PPR genes in A. lyrata wassignificantly elevated compared with the genome aver-age (Table 1; P , 0.05). Although FSTnonsyn

was notsignificantly elevated, the trend suggested a mean value

higher than the permuted mean (Table 1). Neither FSTsyn

nor FSTnonsynwere significantly elevated for PPR genes in

A. thaliana, and, as with A. lyrata, the trend suggesteda mean value higher than the permuted mean (Table1). Three loci, however, At1g03560, At2g28050, andAt2g36980, individually displayed significantly elevatedFSTnonsyn

values in A. thaliana.FSTsyn for both PPR genes and the genome average was

calculated for each pair of populations in A. lyrata. Ofthe 15 possible population combinations, the averageFSTsyn in PPR genes was significantly elevated above thatof the genome average in Canada–Russia, Germany–Canada, Germany–United States, and Russia–UnitedStates while FSTnonsyn

in PPR genes was significantlyelevated in Canada–Russia, Germany–Canada, Ger-many–United States, and Sweden–United States (Figure1). Thus, in both cases, �30% of FST’s are significantlyelevated, although it should be emphasized that these

Figure 1.—Pairwise population differentiation at synony-mous and nonsynonymous sites. The population differentia-tion parameters FSTsyn

(A) and FSTnonsyn(B) for both PPR genes

and the genome average were calculated for each pair ofpopulations in A. lyrata. FST values were calculated using es-timates of the average pairwise synonymous differences, psyn

(A), and estimates of the average pairwise nonsynonymousdifferences, pnonsyn (B), where p is the average number ofpairwise differences between two individuals. Statistical sig-nificance as estimated using permutation tests is indicatedby an asterisk.

Pentatricopeptide Repeat Protein Family 667D

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 6: Signature of Diversifying Selection on Members of the

tests are not independent, given the recurrent use ofindividual populations.

Estimates of diversity calculated while controllingfor divergence: While our initial analysis suggestedelevated diversity at PPR genes in A. lyrata, it is possiblethat this increase in diversity is due either to the actionof balancing selection maintaining polymorphism atthese loci or to an increased mutation rate. While theelevated Tajima’s D and population differentiationsuggest diversifying selection, we wanted to investigatefurther the possibility of an elevated mutation rate,particularly since the variance in FST and Tajima’s D canbe influenced by locus-specific heterozygosity (Tajima

1989; Beaumont and Nichols 1996), which couldpotentially influence our statistical comparisons ofPPR genes with reference genes.

If diversity was elevated in PPR genes due to a highermutation rate, we would expect this to be reflected inhigh divergence at these genes. As shown in Figure 2,diversity still appears elevated in PPR genes compared tothe genome average in A. lyrata when compared withloci having similar divergence. To explicitly test forelevated diversity at PPR genes controlling for diver-gence, we employed a mlHKA test (Wright andCharlesworth 2004). The mlHKA test is based onthe HKA test, which uses multilocus nucleotide data totest for neutral evolution by comparing within-speciesdiversity and between-species divergence (Hudson et al.1987). Under this framework, a model allowing selectionon PPR genes as a class shows a significant improvementover a model that assumes that all genes are neutral(Table 2). Estimates of the parameter k, the degree towhich diversity is decreased or increased relative toneutral expectation, shows values generally .1, indicat-ing a general elevation of diversity relative to divergence(Table 2).

Figure 2.—Estimates of synonymous diversity (psyn, the av-erage pairwise differences) vs. per-site synonymous diver-gence (Ks) in A. lyrata.

TA

BL

E2

Max

imu

m-l

ikel

iho

od

anal

ysis

of

sile

nt

po

lym

orp

his

min

A.

lyra

ta

kva

lueb

Mo

del

lnL

aL

ikel

iho

od

-rat

iost

atis

tic(

d.f

.)P

valu

eA

T1G

0356

0A

T1G

5972

0A

T1G

7460

0A

T2G

2805

0A

T2G

3698

0A

T3G

6289

0A

T4G

1419

0

Neu

tral

(all

1)�

302.

941

11

11

11

Sele

ctio

no

n7

PP

Rge

nes�

294.

2117

.462

(6)

,0.

011.

3010

93.

1274

21.

7162

61.

6607

60.

8328

181.

1270

51.

8997

9

Th

em

od

elw

asru

nu

nd

ertw

om

od

els:

an

eutr

alm

od

elan

da

sele

ctio

nm

od

elin

wh

ich

the

PP

Rge

nes

are

the

targ

eto

fse

lect

ion

.a

lnL

isth

elo

g-li

keli

ho

od

valu

efo

rea

chm

od

el.

bk

valu

ees

tim

ates

of

each

of

the

PP

Rge

nes

inA

.ly

rata

asca

lcu

late

du

sin

gm

lHK

A,w

her

ek

isa

mea

sure

of

the

deg

ree

tow

hic

hd

iver

sity

issi

gnifi

can

tly

elev

ated

or

red

uce

din

com

par

iso

nw

ith

neu

tral

ity.

668 J. P. Foxe and S. I. WrightD

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 7: Signature of Diversifying Selection on Members of the

Although mlHKA controls for differences across loci inmutation rate, it does not account for inflated variancein diversity due to demographic history. In particular,previous work has inferred that A. lyrata populationshave experienced recent severe population bottlenecks,which may contribute to excess polymorphism at asubset of loci. To explicitly control for this, we madeuse of demographic inferences from the approximateBayesian compuation implemented for these popula-tions by Ross-Ibarra et al. (2008). We simulated theinferred population bottlenecks and subsequent re-covery in six independent populations, all derived froma single ancestral population, and accounted for differ-ences in population mutation rate by scaling each locusby synonymous divergence. Consistent with our otherapproaches, the observed ratio of synonymous diversityat PPR genes relative to the total data set is significantlyelevated compared with simulated data sets, and six ofseven individual loci show evidence for significantlyelevated polymorphism (Table 1, Figure S1). In addi-tion, the average Tajima’s D and the Tajima’s D value atAt1g03560 and At1g74600 show significant elevation(Table 1, Figure S2).

Diversity estimates at PPR genes and their flankingregions: According to our hypothesis, PPR genes shouldbe the direct target of selection, and diversity levelsshould be lower in genes surrounding them. To test this,we analyzed diversity in the regions surrounding each ofthe seven PPR genes in A. lyrata. In each case, diversitywas calculated in 100-bp overlapping windows. Figure 3shows psyn at each of the seven PPR gene fragments usedin this study as well as psyn at seven fragments withinthese PPR loci and 20 fragments adjacent to the PPRgenes. For a number of loci, particularly At1g03560,At1g59720, At1g74600, and At2g28050, psyn can be seento peak at the PPR gene fragment and decay withdistance from the PPR gene.

DISCUSSION

A number of PPR loci surveyed were found to displayelevated levels of polymorphism, excess high-frequencyvariants, and in some cases elevated among-populationdifferentiation in comparison with neutral loci in A.lyrata. Taken together, these results suggest the action ofdiversifying selection on some PPR genes in this species.Given the putative function of these loci as regulators ofcytoplasmic genes and the sequence similarity of someof them to fertility restorers in CMS, our results suggestthat there may be selection associated with ongoingcoevolution between cytoplasmic and nuclear genes.The lack of comparable evidence for selection on theseloci in the highly selfing A. thaliana is at least consistentwith the hypothesis that these unusual diversity patternsare reflective of cytonuclear conflict in the outcrossingA. lyrata. The one exception is evidence for a slight but

significant excess of nonsynonymous polymorphism inA. thaliana; given this pattern in the absence of any otherevidence for selection, this could reflect a relaxation ofselective constraint on this class of proteins relative tothe other genes sequenced.

Figure 3.—Silent diversity estimates (psyn, the average pair-wise differences) at each of the seven PPR genes and theirflanking regions in A. lyrata (A–G). PPR loci (including thoseflanking genes deemed to be PPR loci) are labeled in A–G.Solid lines indicate the position of the PPR gene coding se-quence. Open boxes represent the positions of the PPR mo-tifs within each PPR gene. Horizontal dashed lines representthe A. lyrata genome average value of psyn. Vertical hatchedlines indicate gaps in physical location for which sequencedata were not obtained.

Pentatricopeptide Repeat Protein Family 669D

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 8: Signature of Diversifying Selection on Members of the

A dominant challenge in molecular population ge-netic studies is distinguishing the role of locus-specificselection from other factors that can create a highvariance in diversity, such as population history andvariation in mutation rates (Wright and Gaut 2005). Inthis study, we used four approaches to test for selectionon PPR genes in these populations while controlling forother potential factors influencing diversity. First, ournonparametric approach made use of the multilocusdata set to generate an empirical null distributionagainst which to test candidate genes. This approachindicated that the average diversity, Tajima’s D, andFST values for our PPR genes show significant elevationcompared with the reference loci and that six of ourseven PPR genes show individual diversity levels in excessrelative to the empirical null distribution. Since de-mographic factors are not expected to act specifically ona particular class of gene, this approach should be robustto the historical factors inflating the variance in diversityacross loci. Second, our analysis of flanking regionsgenerally provides supporting evidence that the elevateddiversity is particularly localized at the PPR genesthemselves. Third, mlHKA suggests that mutation ratedifferences alone cannot account for the excess di-versity. Finally, our simulation-based approach thatcontrols for both variation in mutation rate across lociand the inferred demographic history supports ourconclusions that this class of gene shows unusuallyelevated diversity.

An advantage of the fourth approach applying co-alescent simulations is that both demographic historyand heterogeneity in mutation rates are controlled forin the same analysis. Although the simulations per-formed by Ross-Ibarra et al. (2008) suggested that thedemographic model alone provided a good fit to boththe mean and the variance of diversity statistics, thesesimulations allowed for locus-specific mutation ratesinferred only from diversity patterns. The previousanalysis was thus not constrained in plausible mutationrates by divergence between species. In other words, theprevious simulation approach allowed all differences inlevels of diversity across loci to be explained by differ-ences in mutation rate. Our results for PPR genes heresuggest that the demographic model alone may notexplain the entire variance in diversity statistics, oncemutation rates are scaled by divergence. As multilocusdata sets accumulate for this species, it will be in-teresting to re-explore the extent to which heterogene-ity in diversity patterns in A. lyrata can be explained bydemographic history alone, or whether hitchhiking iscommon across the genome.

In this study, we have taken the approach of studyingPPR genes as a class, since neutral demographic pro-cesses are not expected to affect a particular set of genesdifferentially, which may allow for increased power todetect recurring selection acting on a gene family. Sincethere can be low power to detect many types of selection

(Thornton and Jensen 2007), examining a set of genessubject to shared selective history should enhance theability to detect selection (Bakker et al. 2006). Un-surprisingly, however, the signature of balancing selec-tion is not consistent across all of our PPR genes.At2g36980, for example, does not show any patternssuggesting elevated diversity. Furthermore, only twoindividual loci show significantly elevated Tajima’s Dvalues, although an additional three show values con-siderably higher than the multilocus average. Finally,several PPR genes flanking our target loci do not showprominent signatures of selection (Figure 3). PPR genes,many of which show severe knockout phenotypes, arethought to play a role in post-transcriptional processesthrough an RNA-binding mechanism and have been im-plicated in a variety of essential functions. Thus many ofthese loci are likely to be subject to neutral patterns ofdiversity and molecular evolution and may not be subjectto coevolutionary interactions with their cytoplasmictargets.

High nucleotide diversity in range-wide populationsamples can be indicative of two types of selectiveprocesses, collectively called balancing selection by mole-cular population geneticists (Charlesworth 2006).First, frequency-dependent selection and heterozygoteadvantage can selectively maintain high polymorphismwithin populations and may (although not necessarily)also distort the site frequency spectrum toward anexcess of high-frequency variants (Charlesworth

2006). Second, local adaptation and spatially varyingselection across populations can also act to maintainhigh levels of species-wide diversity, giving a signature ofbalancing selection (Charlesworth 2006). Given theevidence for a general excess of both diversity anddifferentiation, spatially varying selection may predom-inate. On the other hand, we did observe high within-population polymorphism at several PPR loci, andsingle-locus tests reveal little sign of high differentiation(Table S3). Nevetheless, simulation results indicate thatlocal selection coupled with migration can also lead toexcess within-population polymorphism (Charles-

worth et al. 1997), and the power to detect localadaptation from single-locus tests may be low in ahighly structured species such as A. lyrata (Ross-Ibarra

et al. 2008).One possibility for a mode of selection on some PPR

genes is local selection for regulation of population-specific cytoplasmic alleles. Given the evidence forelevated between-population differentiation, particu-larly between European and North American popula-tion pairs, it is possible that distinct cytoplasmic variantsarise in individual regions, contributing to regionaldirectional selection on PPR alleles. An alternativepossibility is frequency-dependent selection; increasesin the frequency of cytoplasmic mutants that reducemale fertility, either quantitatively or qualitatively, couldselect for rare PPR alleles. As the suppressor increases in

670 J. P. Foxe and S. I. WrightD

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 9: Signature of Diversifying Selection on Members of the

frequency, this favors rare cytoplasmic variants, and thusthe rare restorer alleles, and could act to maintainvariation over long evolutionary timescales.

Our primary hypothesis is that PPR genes are underselection mediated by cytonuclear conflict. However,an alternative is that there is selection for compensa-tory nuclear mutations in response to the fixation ofslightly deleterious cytoplasmic mutations. Cytoplasmicgenomes typically have reduced effective sizes relativeto the nuclear genome, and we have recently shownevidence supporting the expected fourfold reductionin effective size in cytoplasmic genes relative to nucleargenes in A. lyrata (Wright et al. 2008a). In animals,patterns suggesting higher amino acid fixation rates inthe mitochondrial genomes of species with smallpopulation sizes are consistent with the existence of asignificant class of mildly deleterious amino acid muta-tions (Popadin et al. 2007). Postglacial bottlenecks in A.lyrata (Ross-Ibarra et al. 2008) might drive the fixationof slightly deleterious cytoplasmic mutations, leading tothe selective fixation of compensatory nuclear alleles.It has recently been hypothesized that the tremen-dous expansion of PPR genes in plants may in partreflect compensatory evolution for the fixation ofdeleterious cytoplasmic amino acid mutations (Schmitz-Linneweber and Small 2008). If the local fixation ofdeleterious amino acid mutations in cytoplasmic geneshas occurred in A. lyrata, local directional selectionmay be acting on PPR genes to silence these changes.Recent evidence that one of our target loci, At1g59720,functions in RNA editing at multiple sites in the chloro-plast (Okuda et al. 2009) is consistent with this. Finally,changes in PPR genes could mediate adaptive cytoplas-mic mutations without any corresponding changes incytoplasmic genes; given the low mutation rate in bothplant mitochondria and chloroplasts, PPR modifica-tions of cytoplasmic proteins may be an importantengine of adaptive evolution, irrespective of coevolu-tionary interactions.

If local adaptation and balancing selection on cyto-nuclear interactions are in fact prevalent in A. lyrata, wewould expect to commonly expose cytonuclear fitnesseffects during reciprocal crossing experiments, particu-larly between populations. Our observed excess ofdifferentiation at PPR genes between European andNorth American populations suggests that crossesbetween regions may provide an opportunity to unmaskcytonuclear effects on male fitness in this species. If suchpatterns are indeed uncovered, these highly polymor-phic PPR loci represent important candidate genescontributing to cytonuclear epistasis in fitness.

Clearly, PPR genes have been shown to have diversefunctions, and genetic studies and direct functionalcharacterization of more of these loci are clearlyessential. Nevertheless, our population genetic dataprovide preliminary evidence that cytonuclear conflictmay be prevalent in outcrossing hermaphrodites and

may play an important role in the structuring ofgenomes and genetic variation. This complementsother recent genetic and population-level studies ofCMS variation in natural populations of Mimulus, whichsuggest the prevalence of cytonuclear conflicts in out-crossing plant populations (Fishman and Willis 2006;Case and Willis 2008).

We thank B. S. Gaut for discussion and comments on the man-uscript. We thank J. Ross-Ibarra for assistance and advice with thepairwise population differentiation estimates. We thank R. Gaut forproviding extracted DNA for 48 A. thaliana individuals. This work wassupported by a Natural Sciences and Engineering Research Council ofCanada Discovery Grant, an Early Researcher Award from the OntarioMinistry of Research and Innovation, and an Alfred P. SloanFoundation Fellowship (S.I.W.).

LITERATURE CITED

Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang

et al., 1997 Gapped BLASTand PSI-BLAST: a new generation ofprotein database search programs. Nucleic Acids Res. 25: 3389–3402.

Arabidopsis Genome Initiative, 2000 Analysis of the genome se-quence of the flowering plant Arabidopsis thaliana. Nature 408:796–815.

Bachtrog, D., and P. Andolfatto, 2006 Selection, recombinationand demographic history in Drosophila miranda. Genetics 174:2045–2059.

Bakker, E. G., C. Toomajian, M. Kreitman and J. Bergelson,2006 A genome-wide survey of R gene polymorphisms in Arabi-dopsis. Plant Cell 18: 1803–1818.

Beaumont, M. A., and R. A. Nichols, 1996 Evaluating loci for usein the genetic analysis of population structure. Proc. R. Soc.Lond. B. 263: 1619–1626.

Bentolila, S., A. A. Alfonso and M. R. Hanson, 2002 A penta-tricopeptide repeat-containing gene restores fertility to cyto-plasmic male-sterile plants. Proc. Natl. Acad. Sci. USA 99:10887–10892.

Brown, G. G., N. Formanova, H. Jin, R. Wargachuk, C. Dendy et al.,2003 The radish Rfo restorer gene of Ogura cytoplasmic malesterility encodes a protein with multiple pentatricopeptide re-peats. Plant J. 35: 262–272.

Budar, F., P. Touzet and R. De Paepe, 2003 The nucleo-mitochon-drial conflict in cytoplasmic male sterilities revisited. Genetica117: 3–16.

Burt, A., and R. Trivers, 2006 Genes in Conflict: The Biology of SelfishGenetic Elements. The Belknap Press of Harvard University Press,Cambridge, MA.

Case, A. L., and J. H. Willis, 2008 Hybrid male sterility in Mimulus(Phrymaceae) is associated with a geographically restricted mito-chondrial rearrangement. Evolution 62: 1026–1039.

Charlesworth, B., M. Nordborg and D. Charlesworth, 1997 Theeffects of local selection, balanced polymorphism and backgroundselection on equilibrium patterns of genetic diversity in subdividedpopulations. Genet. Res. 70: 155–174.

Charlesworth, D., 2006 Balancing selection and its effects on se-quences in nearby genome regions. PLoS Genet. 2: e64.

Charlesworth, D., and V. Laporte, 1998 The male-sterility poly-morphism of Silene vulgaris: analysis of genetic data from two pop-ulations and comparison with Thymus vulgaris. Genetics 150:1267–1282.

Desloire, S., H. Gherbi, W. Laloui, S. Marhadour, V. Clouet et al.,2003 Identification of the fertility restoration locus, Rfo, in rad-ish, as a member of the pentatricopeptide-repeat protein family.EMBO Rep. 4: 588–594.

Fishman, L., and J. H. Willis, 2006 A cytonuclear incompatibilitycauses anther sterility in Mimulus hybrids. Evolution Int. J.Org. Evolution 60: 1372–1381.

Frank, S. A., 1989 The evolutionary dynamics of cytoplasmic malesterility. Am. Nat. 133: 345–376.

Pentatricopeptide Repeat Protein Family 671D

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 10: Signature of Diversifying Selection on Members of the

Geddy, R., and G. G. Brown, 2007 Genes encoding pentatricopep-tide repeat (PPR) proteins are not conserved in location in plantgenomes and may be subject to diversifying selection. BMC Ge-nomics 8: 130.

Gouyon, P. H., and D. Couvet, 1987 A conflict between two sexes,females and hermaphrodites. Experientia Suppl. 55: 245–261.

Hanson, M. R., 1991 Plant mitochondrial mutations and male ste-rility. Annu. Rev. Genet. 25: 461–486.

Hansson, B., A. Kawabe, S. Preuss, H. Kuittinen and D.Charlesworth, 2006 Comparative gene mapping in Arabi-dopsis lyrata chromosomes 1 and 2 and the corresponding A.thaliana chromosome 1: recombination rates, rearrangementsand centromere location. Genet. Res. 87: 75–85.

Hudson, R. R., 2002 Generating samples under a Wright-Fisherneutral model of genetic variation. Bioinformatics 18: 337–338.

Hudson, R. R., M. Kreitman and M. Aguade, 1987 A test of neutralmolecular evolution based on nucleotide data. Genetics 116:153–159.

Hudson, R. R., M. Slatkin and W. P. Maddison, 1992 Estimation oflevels of gene flow from DNA sequence data. Genetics 132: 583–589.

Kazama, T., and K. Toriyama, 2003 A pentatricopeptide repeat-containing gene that promotes the processing of aberrantatp6 RNA of cytoplasmic male-sterile rice. FEBS Lett. 544: 99–102.

Kimura, M., 1983 The Neutral Theory of Molecular Evolution. Cam-bridge University Press, Cambridge, UK.

Koizuka, N., R. Imai, H. Fujimoto, T. Hayakawa, Y. Kimura et al.,2003 Genetic characterization of a pentatricopeptide repeatprotein gene, orf687, that restores fertility in the cytoplasmicmale-sterile Kosena radish. Plant J. 34: 407–415.

Komori, T., S. Ohta, N. Murai, Y. Takakura, Y. Kuraya et al.,2004 Map-based cloning of a fertility restorer gene, Rf-1, in rice(Oryza sativa L.). Plant J. 37: 315–325.

Kotera, E., M. Tasaka and T. Shikanai, 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplasts. Na-ture 433: 326–330.

Lurin, C., C. Andres, S. Aubourg, M. Bellaoui, F. Bitton et al.,2004 Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle biogenesis.Plant Cell 16: 2089–2103.

Nakamura, T., G. Schuster, M. Sugiura and M. Sugita,2004 Chloroplast RNA-binding and pentatricopeptide repeatproteins. Biochem. Soc. Trans. 32: 571–574.

Nordborg, M., T. T. Hu, Y. Ishino, J. Jhaveri, C. Toomajian et al.,2005 The pattern of polymorphism in Arabidopsis thaliana.PLoS Biol. 3: e196.

Okuda, K., A. L. Chateigner-Boutin, T. Nakamura, E. Delannoy,M. Sugita et al., 2009 Pentatricopeptide repeat proteins withthe DYW motif have distinct molecular functions in RNA editingand RNA cleavage in Arabidopsis chloroplasts. Plant Cell 21: 146–156.

O’Toole, N., M. Hattori, C. Andres, K. Iida, C. Lurin et al.,2008 On the expansion of the pentatricopeptide repeat genefamily in plants. Mol. Biol. Evol. 25: 1120–1128.

Popadin, K., L. V. Polishchuk, L. Mamirova, D. Knorre and K.Gunbin, 2007 Accumulation of slightly deleterious mutationsin mitochondrial protein-coding genes of large versus smallmammals. Proc. Natl. Acad. Sci. USA 104: 13390–13395.

Rand, D. M., R. A. Haney and A. J. Fry, 2004 Cytonuclear coevolu-tion: the genomics of cooperation. Trends Ecol. Evol. 19: 645–653.

Ross-Ibarra, J., S. I. Wright, J. P. Foxe, A. Kawabe, L. DeRose-Wilson et al., 2008 Patterns of polymorphism and demo-graphic history in natural populations of Arabidopsis lyrata. PloSOne 3: e2411.

Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer and R. Rozas,2003 DnaSP, DNA polymorphism analyses by the coalescentand other methods. Bioinformatics 19: 2496–2497.

Schmitz-Linneweber, C., and I. Small, 2008 Pentatricopeptide re-peat proteins: a socket set for organelle gene expression. TrendsPlant Sci. 13: 663–670.

Schnable, P. S., and R. P. Wise, 1998 The molecular basis of cyto-plasmic male sterility and fertility restoration. Trends Plant Sci. 3:175–180.

Small, I. D., and N. Peeters, 2000 The PPR motif: a TPR-relatedmotif prevalent in plant organellar proteins. Trends Biochem.Sci. 25: 46–47.

Tajima, F., 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism. Genetics 123: 585–595.

Thornton, K., 2003 Libsequence: a C11 class library for evolution-ary genetic analysis. Bioinformatics 19: 2325–2327.

Thornton, K. R., and J. D. Jensen, 2007 Controlling the false-pos-itive rate in multilocus genome scans for selection. Genetics 175:737–750.

Wang, Z., Y. Zou, X. Li, Q. Zhang, L. Chen et al., 2006 Cytoplasmicmale sterility of rice with boro II cytoplasm is caused by a cyto-toxic peptide and is restored by two related PPR motif genesvia distinct modes of mRNA silencing. Plant Cell 18: 676–687.

Wright, S., 1969 The Theory of Gene Frequencies. University of Chica-go Press, Chicago.

Wright, S. I., and B. Charlesworth, 2004 The HKA test revisited:a maximum-likelihood-ratio test of the standard neutral model.Genetics 168: 1071–1076.

Wright, S. I., and B. S. Gaut, 2005 Molecular population geneticsand the search for adaptive evolution in plants. Mol. Biol. Evol.22: 506–519.

Wright, S. I., N. Nano, J. P. Foxe and V. U. Dar, 2008a Effectivepopulation size and tests of neutrality at cytoplasmic genes in Ara-bidopsis. Genet. Res. 90: 119–128.

Wright, S. I., R. W. Ness, J. P. Foxe and S. C. H. Barrett,2008b Genomic consequences of outcrossing and selfing inplants. Int. J. Plant Sci. 169: 105–118.

Communicating editor: O. Savolainen

672 J. P. Foxe and S. I. WrightD

ownloaded from

https://academic.oup.com

/genetics/article/183/2/663/6063014 by guest on 01 March 2022

Page 11: Signature of Diversifying Selection on Members of the

Supporting Information http://www.genetics.org/cgi/content/full/genetics.109.104778/DC1

Signature of Diversifying Selection on Members of the Pentatricopeptide Repeat Protein Family in Arabidopsis lyrata

John Paul Foxe and Stephen I. Wright

Copyright © 2009 by the Genetics Society of America DOI: 10.1534/genetics.109.104778

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 12: Signature of Diversifying Selection on Members of the

J. P. Foxe and S. I. Wright 2 SI

FIGURE S1.—πsyn from 10,000 coalescent simulations under the best demographic model (see text). The first panel shows the mean for the PPR gene family and the remaining panels show values for each individual PPR locus. Observed estimates are indicated by arrows.

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 13: Signature of Diversifying Selection on Members of the

J. P. Foxe and S. I. Wright 3 SI

FIGURE S2.—Tajima’s Dsyn from 10,000 coalescent simulations under the best demographic model (see text). The first panel shows the mean for the PPR gene family and the remaining panels show values for each individual PPR locus. Observed estimates are indicated by arrows.

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 14: Signature of Diversifying Selection on Members of the

J. P. Foxe and S. I. Wright 4 SI

TABLE S1

List of gene fragments sequenced and their sample size in A. lyrata in this study. PPR loci are given in bold.

Locus Sample Size

1g03560-1053744* 122

1g03560-1054749 24

1g03590-1055816 20

1g03590-1056303 20

1g03590-1056868 22

1g59710-3772518 24

1g59710-3773200 24

1g59720-3770925* 82

1g59740-3722719 74

1g74580-16338131 98

1g74580-16339064 94

1g74600-16342481* 112

1g74630-16350229 106

1g74640-16352834 82

2g28040-12173691 18

2g28050-12170614 20

2g28050-12171388* 126

2g28050-12171510 24

2g36970-17628199 24

2g36970-17629020 16

2g36980-17630803 20

2g36980-17631328* 118

2g36980-17631688 18

3g62890-20949853 14

3g62890-20950758 20

3g62890-20951026* 116

4g14170-14309884 70

4g14180-14302891 22

4g14180-14304679 22

4g14180-14304802 14

4g14180-14307137 6

4g14190-14301512* 110

4g14280-14214623 108

4g14280-14215399 88

* fragments surveyed in ROSS-IBARRA et al, 2008

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 15: Signature of Diversifying Selection on Members of the

J. P. Foxe and S. I. Wright 5 SI

TABLE S2

List of A. thaliana accessions used in this study as well as the number of individuals used in this study and

their region of origin.

Accession Number of individuals used Region of origin

Col-0 1 unknown

RRS-7 2 U.S. Midwest (Indiana)

RRS-10 2 U.S. Midwest (Indiana)

KNO-10 2 U.S. Midwest (Indiana)

KNO-18 2 U.S. Midwest (Indiana)

RMX-A02 2 U.S. Midwest (Michigan)

RMX-A180 2 U.S. Midwest (Michigan)

PNA-17 2 U.S. Midwest (Michigan)

PNA-10 2 U.S. Midwest (Michigan)

Eden-1 2 North Sweden

Eden-2 2 North Sweden

Lov-1 2 North Sweden

Lov-5 2 North Sweden

Fab-2 2 North Sweden

Fab-4 2 North Sweden

Bil-5 2 North Sweden

Bil-7 2 North Sweden

Var-2-1 2 South Sweden

Var-2-6 2 South Sweden

Spr-1-2 2 South Sweden

Spr-1-6 2 South Sweden

Omo-2-1 2 South Sweden

Omo-2-3 2 South Sweden

Ull-2-5 2 South Sweden

Ull-2-3 2 South Sweden

Zdr-1 2 Central Europe (Czech Republic)

Zdr-6 2 Central Europe (Czech Republic)

Bor-1 2 Central Europe (Czech Republic)

Bor-4 2 Central Europe (Czech Republic)

Pu2-7 2 Croatia

Pu2-23 2 Croatia

LP2-2 2 Central Europe (Czech Republic)

LP2-6 2 Central Europe (Czech Republic)

HR-5 2 England

HR-10 2 England

NFA-8 2 England

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 16: Signature of Diversifying Selection on Members of the

J. P. Foxe and S. I. Wright 6 SI

NFA-10 2 England

SQ-1 2 England

SQ-8 2 England

CIBC-5 2 England

CIBC-17 2 England

TAMM-2 2 Finland

TAMM-27 2 Finland

KZ-1 2 Kazakhstan

KZ-9 2 Kazakhstan

GOT-7 2 Germany

GOT-22 2 Germany

REN-1 1 France

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 17: Signature of Diversifying Selection on Members of the

J. P. Foxe and S. I. Wright 7 SI

TABLE S3

Levels of Diversity and Differentiation at PPR genes and genome averages in A. lyrata for each population

used in this study.

ICELAND πSyna πNonSyn TajDSynb TajDNonSyn FstSync FstNonSyn

At1g03560 0.033775851 0.005474173 -0.739899364 -0.564630455 0.405816169 0.409731152

At1g59720 0.074805928 0.01504378 0.75202693 0.485299508 0.123630212 0.319751477

At1g74600 0.004229267 0 1.430241391 0 0.953055288 1

At2g28050 0.017747235 0.004551121 0.827003785 1.457218245 0.715840773 0.64641257

At2g36980 0.016461376 0.006844992 0.958335646 1.628429046 0.145273562 0.174847058

At3g62890 0.038829992 0.00847152 2.301453559 2.114134733 0.317772503 0.396555055

At4g14190 0.016904698 0.001912616 1.855417475 -0.549509032 0.753883062 0.751469994

PPR average 0.028964907 0.0060426 1.054939917 0.652991721 0.487895938 0.528395329

permuted mean 0.012334615 0.002179123 0.472634204 0.162742991 0.504968684 0.270315738

GERMANY πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn

At1g03560 0.049687564 0.008366133 0.8494825 0.099031862 0.125897762 0.097897028

At1g59720 0.087298403 0.014764863 0.171142575 0.140275284 -0.022722193 0.332363516

At1g74600 0.094175947 0.012471549 2.942282427 2.169098439 -0.045349598 -0.050215477

At2g28050 0.068183156 0.017634871 0.924537954 0.547322928 -0.091712212 -0.370095153

At2g36980 0.015008338 0.007870927 0.283235749 1.165949245 0.220719884 0.051172265

At3g62890 0.014821732 0.007586172 -0.300815438 1.132377452 0.739588077 0.459620329

At4g14190 0.068618236 0.006499866 1.088525427 0.443364193 0.000981252 0.155391567

PPR average 0.056827625 0.010742055 0.851198742 0.813917058 0.132486139 0.096590582

permuted mean 0.021856224 0.004113937 0.330829138 0.167145305 -0.369993163 -0.148292811

CANADA πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn

At1g03560 0.001025392 0.00059795 -1.164672189 -1.507756012 0.981961334 0.935524317

At1g59720 0.061203555 0.013007877 0.752859064 0.956875828 0.28298535 0.411810758

At1g74600 0 0.00073586 0 0.138693107 1 0.938034075

At2g28050 0 0 0 0 1 1

At2g36980 0 0 0 0 1 1

At3g62890 0.003580035 0 0.593481941 0 0.937100206 1

At4g14190 0.01897986 0.002465489 -2.320883623 -1.327164848 0.723670602 0.679628402

PPR average 0.012112692 0.002401025 -0.305602115 -0.248478846 0.84653107 0.852142507

permuted mean 0.005373627 0.001248186 0.201420545 0.144104985 0.716856115 0.731617506

US πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn

At1g03560 0.001747835 0.001048113 -0.591550125 -1.440706444 0.969252128 0.886984162

At1g59720 0.00870042 0.001574731 -0.706252211 -1.001802986 0.898072442 0.928793919

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022

Page 18: Signature of Diversifying Selection on Members of the

J. P. Foxe and S. I. Wright 8 SI

At1g74600 0 0 0 0 1 1

At2g28050 0 0 0 0 1 1

At2g36980 0.006919908 0.003609256 -0.948945187 -1.077765584 0.640696586 0.564909874

At3g62890 0 0.001140876 0 0.649980502 1 0.918732918

At4g14190 0.02055604 0.002184146 -1.024325077 0.906187075 0.700722867 0.716186774

PPR average 0.005417743 0.001365303 -0.467296086 -0.280586777 0.886963432 0.859372521

permuted mean 0.005555557 0.001277209 0.359703258 0.182018719 0.711436741 0.713918109

SWEDEN πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn

At1g03560 0.033678523 0.005725796 -0.739899364 -0.427556872 0.40752836 0.382599204

At1g59720 0.031314541 0.002848818 1.584413719 1.641452667 0.633142475 0.87118234

At1g74600 0.002721451 0.00254079 0.155745965 1.687095109 0.969791988 0.78604289

At2g28050 0.029933722 0.008722971 0.344450284 1.167481188 0.520717254 0.322291604

At2g36980 0.007346207 0.000798243 -0.237849822 0.021929652 0.618561844 0.903773037

At3g62890 0.007264957 0.001429532 -2.003179165 -1.83087563 0.872357595 0.898171271

At4g14190 0.0744422 0.006294607 2.11928621 0.930998555 -0.083810329 0.182063413

PPR average 0.026671657 0.004051537 0.17470969 0.455789239 0.562612741 0.620874823

permuted mean 0.009068391 0.001655283 0.318069453 0.303681742 0.537710295 0.569044586

RUSSIA πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn

At1g03560 0.04037556 0.004951227 0.321201768 0.112944567 0.289714279 0.46611939

At1g59720 0.012501114 0.001675559 -2.21100974 -1.928604462 0.853546382 0.924234689

At1g74600 0 0 0 0 1 1

At2g28050 0 0 0 0 1 1

At2g36980 0.010187655 0.005203496 1.097275256 1.411195865 0.471024918 0.372726805

At3g62890 0.031165105 0.003379609 1.645761699 0.3476094 0.452441516 0.759263079

At4g14190 0.009422925 0.002747822 -1.335875919 -1.060051909 0.862810825 0.642941356

PPR average 0.01480748 0.002565387 -0.068949562 -0.159558077 0.704219703 0.737897903

permuted mean 0.006970173 0.000997386 0.270243122 0.070909296 0.596309547 0.721937385 a synonymous and nonsynonymous nucleotide diversity as measured by πSyn and πNonSyn where π is the average number of pairwise differences between two individuals b frequency of variants in the each of the PPR genes and genome averages in A. lyrata as measured by calculating Tajima’s D synonymous and nonsynonymous c synonymous and nonsynonymous pairwise population differentiation estimates as measured by the population differentiation parameter Fst calculated using πSyn and πNonSyn d statistically significant values are marked in bold

Dow

nloaded from https://academ

ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M

arch 2022