Signature of Diversifying Selection on Members of the
Transcript of Signature of Diversifying Selection on Members of the
Copyright � 2009 by the Genetics Society of AmericaDOI: 10.1534/genetics.109.104778
Signature of Diversifying Selection on Members of the PentatricopeptideRepeat Protein Family in Arabidopsis lyrata
John Paul Foxe* and Stephen I. Wright†,1
*Department of Biology, York University, Toronto, Ontario M3J 1P3, Canada and †Department of Ecology and EvolutionaryBiology, University of Toronto, Toronto, Ontario M5S 3B2, Canada
Manuscript received May 8, 2009Accepted for publication July 17, 2009
ABSTRACT
Pentatricopeptide repeat (PPR) proteins compose a family of nuclear-encoded transcriptionalregulators of cytoplasmic genes. They have shown dramatic expansion in copy number in plants, andalthough the functional importance of many remains unclear, a subset has been repeatedly implicated asnuclear restorers for cytoplasmic male sterility. Here we investigate the molecular population genetics andmolecular evolution of seven single-copy PPR genes in the outcrossing model plant Arabidopsis lyrata. Incomparison with neutral reference loci, we find, on average, elevated levels of polymorphism and anexcess of high-frequency variants at these PPR genes, suggesting that natural selection is maintainingpolymorphism at some of these loci. This elevation in diversity persists when we control for divergenceand generally decreases in the flanking regions, suggesting that these genes are themselves the targets ofselection. Some of the PPR genes also demonstrate elevated population differentiation, which isconsistent with spatially varying selection. In contrast, no comparable patterns are observed at these loci inA. thaliana, providing no evidence for the action of balancing selection in this selfing species. Takentogether, these results suggest that a subset of PPR genes may be subject to balancing selection associatedwith ongoing cytonuclear coevolution in the outcrossing A. lyrata, which is possibly mediated either byintergenomic conflict or by compensatory evolution.
DESPITE the tight and ancient mutualism betweencytoplasmic and nuclear genomes, conflicts of
interest can repeatedly arise due to differences inmodes of inheritance (Rand et al. 2004; Burt andTrivers 2006). For cytoplasmic genomes that experi-ence maternal inheritance, mutations that enhancefemale fertility, even at a net cost to an individual’ssurvival and total reproduction, will spread via naturalselection since they enhance cytoplasmic transmission(Gouyon and Couvet 1987; Frank 1989; Budar et al.2003). One of the best lines of evidence for this typeof cytonuclear conflict is found in the widespreadphenomenon of cytoplasmic male sterility (CMS) inplants (Frank 1989; Schnable and Wise 1998; Budar
et al. 2003), where male sterility encoded in acytoplasmic gene leads to the spread of females inhermaphroditic plant populations. In most cases, CMScan be suppressed by nuclear-encoded restorer alleles(Bentolila et al. 2002; Brown et al. 2003; Desloire
et al. 2003; Kazama and Toriyama 2003; Koizuka et al.
2003), which are selectively favored as pollen becomeslimited in populations with high frequencies of CMS.
CMS has been documented in .150 plants and canpersist as a reproductive polymorphism in naturalpopulations (gynodioecy) or can appear following in-terspecific hybridization (Schnable and Wise 1998).The molecular basis for CMS in a number of systems hasbeen well documented and seems to involve the mito-chondrial genome in all cases. Expression of chimericmitochondrial open reading frames that interfere withnormal mitochondrial function and pollen develop-ment lead to the male sterile phenotype (Schnable andWise 1998). Unexpectedly, nuclear restorer genes ofCMS cloned from numerous divergent plant speciesappear to arise almost universally from the pentatrico-peptide repeat (PPR) protein family (Bentolila et al.2002; Brown et al. 2003; Desloire et al. 2003; Kazama
and Toriyama 2003; Koizuka et al. 2003), which isdefined by the presence of a degenerate 35-amino-acidmotif. PPR genes exist in high copy numbers in plantgenomes and are generally thought to act as gene-specific transcriptional regulators of cytoplasmic genes(Small and Peeters 2000; Lurin et al. 2004; Nakamura
et al. 2004; Kotera et al. 2005). Although a growingnumber of PPR genes have been characterized(Schmitz-Linneweber and Small 2008), the func-tional importance of many of them remains unclear,although most, if not all, are targeted to the chloroplast
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.104778/DC1.
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. GQ343552-GQ344403.
1Corresponding author: Department of Ecology and Evolutionary Biology,University of Toronto, 25 Willcocks St., Toronto, ON M5S 3B2,Canada. E-mail: [email protected]
Genetics 183: 663–672 (October 2009)
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
or mitochondria (Lurin et al. 2004). The molecularaction of PPR genes varies and includes RNA editing,stability, cleavage, and splicing (Schmitz-Linneweber
and Small 2008). Recent studies suggest that theexpansion of PPR genes in flowering plants may bedue to several waves of retrotransposition (O’Toole
et al. 2008) and that a subset of PPR locations across thegenome may be highly dynamic (Geddy and Brown
2007).The evolutionary dynamics of CMS have been well
documented in gynodioecious plant populations, whichmaintain both females and hermaphrodites withinpopulations (Frank 1989; Hanson 1991; Charlesworth
and Laporte 1998; Budar et al. 2003; Wright et al.2008b). However, any outcrossing hermaphrodite plantpopulation is susceptible to invasion by cytoplasmicmutants that increase female fertility by causing malesterility. Indeed, many instances of CMS have emergedfrom wide intraspecific or interspecific crosses, sug-gesting the exposure of a hidden evolutionary historyof CMS and the fixation of nuclear restorers (Fishman
and Willis 2006; Case and Willis 2008). Furthermore,it is possible that cryptic polymorphisms exist forcytoplasmic alleles that have more subtle effects onmale fertility with nuclear-encoded suppressors of thisactivity. In contrast, we would not expect to find anyevidence for cytonuclear conflicts in highly inbreedingspecies since pollen sterility mutations will reducefemale fertility as well (Fishman and Willis 2006;Wright et al. 2008b).
A recent study of genomewide patterns of DNAsequence variation at 77 loci in the outcrossing modelplant Arabidopsis lyrata provided preliminary evidencethat a number of PPR genes show unusual diversitypatterns consistent with the action of diversifying selec-tion (Ross-Ibarra et al. 2008). In particular, the PPRgenes included in the study generally exhibited highsilent-nucleotide diversity, as well as high levels ofbetween-population differentiation. One PPR locus,At1G74600, also exhibited significantly elevated levelsof range-wide differentiation when applying a demo-graphic model fit to the data, suggesting the action ofdisruptive selection and/or local adaptation on thislocus. Interestingly, this locus, as well as several otherPPR loci from this sample, share detectable amino acidsimilarity with CMS fertility restorers in rice (see below).If there is diversifying selection acting on some PPRgenes in A. lyrata, this may reflect ongoing evolutionarydynamics associated with cytonuclear conflict. To inves-tigate this possibility further, we performed maximum-likelihood and further coalescent and permutation-basedtests of selection on this complement of seven single-copy PPR genes and surveyed variation at flankingregions to test whether the PPR locus is the direct targetof selection. In addition, we surveyed polymorphism atthe same regions in the highly selfing A. thaliana. Ourprediction was that, if selection associated with cytonu-
clear conflict drives the unusual diversity patterns, thenwe would not see comparable evidence for selection inflanking regions or in a related selfing species.
MATERIALS AND METHODS
Targeted PPR loci: We made use of a polymorphism data setfrom coding regions of 7 PPR genes and 50 ‘‘reference’’ genesfrom six populations of A. lyrata (Ross-Ibarra et al. 2008). Togenerate this sample of 57 loci, disease resistance genes, whichhave been hypothesized to be under pervasive selection, wereremoved for all analyses. The PPR genes studied includedAt1g03560, At1g59720, At1g74600, At2g28050, At2g36980,At3g62890, and At4g14190, with locus names based ondesignations from the A. thaliana genome project (Arabidop-
sis Genome Initiative 2000). These loci include representa-tives of all three major subclasses of PPR genes as identified byLurin et al. (2004), including the P class (At1g03560,At2g28050, At4g14190), the E/E1 subclass (At1g74600 andAt2g36980), and the DYW subclass (At1g59720 andAt3g62890). Of these loci, BLASTx searches of the translatednucleotide sequences to the protein database (Altschul et al.1997) reveal that four PPR genes (At1g03560, At1g74600,At2g28050, and At4g14190) show significant sequence ho-mology to fertility restorers cloned in rice (Kazama andToriyama 2003; Komori et al. 2004; Wang et al. 2006),although, in terms of motif arrangements, At1g74600 is notfrom the P class, which is the most common motif arrange-ment found in the vast majority of fertility restorers (Schmitz-Linneweber and Small 2008). Recently, At1g59720 wasfunctionally characterized in A. thaliana as playing a role inRNA editing at multiple sites in the chloroplast (Okuda et al.2009).
For this study, these seven loci were also sequenced in 48accessions of A. thaliana (details below). Additionally, toexpand our regional surveys of diversity around our targetPPR fragments in A. lyrata, 7 primer pairs were designed fromadditional fragments within these genes, as well as 20 primerpairs in flanking genes (supporting information, Table S1).Flanking regions were chosen to be as physically adjacent aspossible to characterize the heterogeneity in polymorphismacross the focal region. All amplified regions were situatedwithin exons and range in size from 200 to 700 bp in length.Three of these flanking loci (At1g74580, At1g74630, andAt4g14170) were also found to be PPR loci. Primers for each ofthese flanking genes were designed using the A. thalianagenome (Arabidopsis Genome Initiative 2000). With thesubsequent release of the A. lyrata genome assembly (http://genome.jgipsf.org/Araly1/Araly1.home.html), we were ableto use BLAST searches to confirm the primer positionsflanking our targeted PPR loci in this species and to calculatephysical positions within A. lyrata.
Population samples: A. lyrata samples consisted of 65individuals, originating from the six natural populationspreviously studied by Ross-Ibarra et al. (2008). The plantsfrom which sequences were obtained include 6 individualsfrom Karhumaki, Russia (from O. Savolainen), 8 individualsfrom Stubbsand, Sweden (O. Savolainen), 15 individuals fromPlech, Germany (T. Mitchell-Olds), 12 individuals from EsjaMountain, Iceland (E. Thorhallsdottir), 12 individuals fromIndiana, United States (B. Mable), and 12 individuals fromRondeau Provincial Park, Ontario, Canada (B. Mable and S.Wright). The 48 A. thaliana individuals used in this studyrepresent a subset of those used by Nordborg et al. (2005)(Table S2). With the exception of 2 populations, 2 individualsfrom each of 24 populations, representing a sampling across
664 J. P. Foxe and S. I. WrightD
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
Europe and the midwestern United States, were used (TableS2). Following Nordberg et al. (2005), these 24 populationswere then further categorized according to their broadergeographic origin, including north Sweden, south Sweden,Central Europe, Europe, England, and the midwestern UnitedStates. DNA from a single plant from each maternal family wasextracted using either the DNeasy plant DNA extraction kit(Qiagen, Valencia, CA) or the FastDNA plant DNA extractionkit (Qbiogene, CA). Extracted A. thaliana DNA was kindlyprovided by R. Gaut (University of California at Irvine).
PCR and sequencing: We employed a direct sequencingstrategy, using single large exons for PCR amplification andsequencing to minimize the chance of unreadable sequencescaused by insertion/deletion variants as described in Ross-Ibarra et al. (2008). Each exon was submitted to a BLASTsearch of the genomic survey sequence database (http://www.ncbi.nlm.nih.gov/BLAST/) to check for the presenceof orthologous regions in the shotgun genome sequence ofBrassica oleracea (Altschul et al. 1997). These orthologousregions were aligned to the A. thaliana genomic sequence toidentify conserved regions for primer design. PCR primerswere designed with the aid of PrimerQuest (Integrated DNATechnologies; http://biotools.idtdna.com/primerquest/) toamplify 650- to 750-bp fragments for sequencing using thesame forward and reverse primers. Primers were also sub-mitted to a BLAST search against the A. thaliana genome toensure amplification of a single-copy region.
PCR reactions were carried out in 25-ml volumes on anEppendorf Mastercycler. The cycles were as follows: 2 min at94�, 20 sec at 94�, 20 sec at 55�, and 40 sec at 72� for 35 cycles,with a final extension time of 4 min at 72�.
Sequencing reactions were carried out by Lark Technolo-gies (Houston, TX). Chromatograms were analyzed usingSequencher 4.6, using the ‘‘call secondary peaks’’ option to aidin the identification of heterozygous sites. All chromatogramswere checked manually for heterozygous nucleotide positions,using the sequence from both strands to confirm putativeheterozygous sites. Nucleotide sequences have been depositedin GenBank (accession nos. GQ343552–GQ344403).
Gene family exclusion: BLASTsearches of our fragments tothe A. lyrata genome assembly (http://genome.jgipsf.org/Araly1/Araly1.home.html) confirmed that our PPR genesand flanking regions were single copy, meaning that nucleo-tide divergence was sufficiently high with other gene familymembers that cross-amplification was highly unlikely. Further-more, none of the PPR genes showed a pattern of fixedheterozygosity (i.e., all samples showed a heterozygous base)from direct sequencing, suggesting that we had successfullyamplified single-copy loci. In addition, each of the seven PPRgenes in A. lyrata in each of two heterozygous individuals werecloned to again ensure the absence of a gene family. PCR wasperformed on each PPR gene in two different individuals, andthe product was cleaned using a Qiagen QIAquick PCRpurification kit. The cleaned PCR product was visualized ona 1% agarose gel, and a ligation reaction was carried out usingan Invitrogen TA cloning kit. Transformation reactions wereperformed using Invitrogen One Shot Top Ten competentEscherichia coli cells. The cell cultures were grown on ampicil-lin-resistant X-GAL-stained agarose plates. White colonieswere screened using a colony PCR technique, and 8–10positive clones for each individual were sequenced by LarkTechnologies. Analysis of the generated sequence revealedsegregation of two haplotypes, while any deviations may beaccounted for by PCR recombination.
Finally, results from segregation analysis in the mappingpopulation (Hansson et al. 2006) confirmed that we wereamplifying single genomic regions, and not gene duplicates,for the following loci: At1g03560, At1g59720, and At1g74600.
From the combination of these results, we are reasonablyconfident that we have successfully amplified only single-copyloci.
Data analysis: Synonymous and nonsynonymous sites wereidentified by aligning each fragment to the correspondingfragment in the A. thaliana genome sequence, identified us-ing BLAST (Altschul et al. 1997), and by using the proteinannotation from A. thaliana. Standard population geneticanalyses of the sequence data were carried out using both theprogram DNAsp Version 4.0 (Rozas et al. 2003) and amodified version of Polymorphurama, Perl script, written by(Bachtrog and Andolfatto 2006). Significant departures atindividual PPR genes were assessed by comparing the averagepairwise differences (p) (Tajima 1989) and Tajima’s D(Tajima 1989) at both synonymous and nonsynonymous sitesfor each locus to the empirical null distribution for the 50reference genes. In addition, permutation tests for meansummary statistics across PPR genes were conducted byresampling 7 loci from the full 57-gene data set, includingPPR loci and reference genes from A. lyrata. Using 100,000permutations, we calculated the proportion of times the meanvalue from 7 permuted loci was as high or higher than theobserved mean from the PPR genes. In A. thaliana, permuta-tion tests for summary statistics were conducted by resampling7 loci from a 116-locus reference gene data set, which wasgenerated by using a subsample of published data fromNordborg et al. (2005). These subsampled loci were selectedto have a minimum of 80 synonymous sites for polymorphismanalysis. For each of these 116 loci, polymorphism data fromcoding regions were analyzed from the same set of 48individuals used in our PPR resequencing study.
Levels of differentiation, as measured using Wright’s(Wright 1969) FST, were calculated using estimates of theaverage pairwise synonymous differences, and psyn was calcu-lated using Perl script for each of the six A. lyrata populationsand for the total A. lyrata data set. Loci for which we had data inat least 10 chromosomes in each population were included inthe estimates. In A. thaliana, each of the populations (asdefined by Nordborg et al. 2005) consisted of six to eightindividuals, and psyn was calculated using DNAsp Version 4.0(Rozas et al. 2003). To estimate global differentiation, wecalculated FST using the formula of Hudson et al. (1992). Inaddition, FST values for each pair of populations werecalculated using Perl script written by J. Ross-Ibarra (Universityof California, Davis, CA).
A maximum-likelihood HKA (mlHKA) test was performedusing the mlHKA program (Wright and Charlesworth
2004). This model is based on the HKA test that evaluates thefit of polymorphism and divergence to expectations under theneutral theory (Hudson et al. 1987). Under the neutral theory,within-species diversity should correlate with between-speciesdivergence (Kimura 1983). The HKA test utilizes polymor-phism data from within a single species and sequence di-vergence data from a related species to compare the relativeamounts of polymorphism and divergence across multipleloci. The mlHKA test allows for a test of selection at individualloci or for a class of genes in a similar multilocus framework.Between-species divergence was calculated using Jukes–Can-tor-corrected synonymous divergence from A. thaliana. Werestricted this analysis to those loci for which we had sequenceinformation for individuals from each of the six populations inA. lyrata. The program was run under a strictly neutral modelfor a total of 2 million chains, followed by a selection model inwhich the 7 PPR genes were designated candidate genes forthe action of selection, again for a total of 2 million chains.Significance was assessed using the likelihood-ratio test wheretwice the difference in log likelihood between the models isapproximately chi-squared distributed with 7 d.f., corresponding
Pentatricopeptide Repeat Protein Family 665D
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
to the difference in the number of parameters between theneutral model and the selection model. Note that, if only asubset of the PPR genes is under selection, this test, whichassumes the independent action of selection on each PPRlocus, should be conservative.
Although mlHKA tests for unusual patterns of diversitywhile controlling for divergence, the demographic history ofA. lyrata may have inflated the variance in diversity beyond thatassumed under the standard neutral model. To test forelevated diversity at PPR genes while controlling for bothdivergence and demography, we therefore employed coales-cent simulations under the demographic model for thesepopulations as inferred by Ross-Ibarra et al. (2008). Briefly,this model assumes an ancestral population that splits into sixdaughter populations, each of which experiences a bottle-neck, followed by a recovery to current population sizes. Wesimulated this demographic history using point estimates ofeach parameter in the model (see Table 1 of Ross-Ibarra et al.2008). Each locus was simulated using this model, but thepopulation mutation parameter was rescaled for each gene bya factor Ksi
/Ksaverage, where Ksi
is the per-site synonymousdivergence between A. lyrata and A. thaliana at locus i andKsaverage
is the average synonymous divergence across loci. In thisway, each locus has a distinct mutation rate estimated by theamount of between-species divergence. Allowing for locus-specific mutation rates provides a more conservative test forselection, since the variance across loci in diversity can beexplained by mutation rather than by selection. We ransimulations 10,000 times for each of the 53 loci used inthe mlHKA analysis using the program msstats, which reads indata from the coalescent simulation program ms and calcu-lates several common summary statistics (Hudson 2002;Thornton 2003). For each simulated data set, we calculatedp and Tajima’s D values. We then calculated the average p andTajima’s D values at the seven simulated PPR genes andcompared it with the global multilocus average. These ratioswere calculated for our observed data at synonymous sites, andsignificance was assessed by comparing to the simulateddistribution. In addition, observed pSyn and Tajima’s Dsynonymous values of each individual PPR locus were com-pared with the simulated distribution for that locus to assesssignificance for individual loci.
RESULTS
Levels of diversity and differentiation at PPR genesin A. lyrata and A. thaliana: Levels of diversity anddifferentiation were calculated for both PPR genes andgenomewide using 57 (50 reference and 7 PPR) nuclear-encoded loci in A. lyrata (as described by Ross-Ibarra
et al. 2008) and 116 nuclear-encoded loci in A. thaliana(as described by Nordborg et al. 2005). Each of the 7PPR genes in this study was resequenced in 48 individ-uals in A. thaliana, representing a subset of thosepopulation samples used by Nordborg et al. (2005) toallow for direct comparison of the levels of diversity anddifferentiation in PPR genes in these two differentspecies (Table S2).
Table 1 shows synonymous and nonsynonymousnucleotide diversity for the seven original PPR genefragments and the genomewide patterns for both A.lyrata and A. thaliana. In A. lyrata, both synonymous andnonsynonymous diversity levels were significantly ele-
TA
BL
E1
Lev
els
of
div
ersi
tyan
dd
iffe
ren
tiat
ion
atP
PR
gen
esan
dge
no
me
aver
ages
inA
.ly
rata
and
A.
tha
lia
na
psy
na
pn
on
syn
Taj
ima’
sD
syn
bT
ajim
a’s
Dn
on
syn
F ST
syn
cF S
Tn
on
syn
Lo
cus
A.
lyra
taA
.th
alia
na
A.
lyra
taA
.th
alia
na
A.
lyra
taA
.th
alia
na
A.
lyra
taA
.th
alia
na
A.
lyra
taA
.th
alia
na
A.
lyra
taA
.th
alia
na
At1
g035
600
.05
7**
*,
0.00
10.
009
0.00
12
.94
6*
NA
2.11
30
.07
60.
454
0.00
00.
530
0.3
13
At1
g597
200
.08
5**
*0.
012
0.0
22
0.0
15
0.23
41.
156
0.56
12
.40
50.
396
�0.
011
0.63
10.
006
At1
g746
000
.09
0**
*,
0.00
10
.01
20.
001
3.7
61
***
NA
3.7
73
�1.
075
0.69
70.
000
0.77
90.
162
At2
g280
500
.06
2**
*0.
002
0.0
13
,0.
001
1.28
7�
1.76
40.
182
�1.
868
0.59
20.
325
0.60
00
.75
0A
t2g3
6980
0.01
90.
005
0.00
80.
001
0.96
4�
0.24
62
.97
5�
1.00
50.
442
0.32
30.
511
0.2
37
At3
g628
900
.05
7**
*,
0.00
10
.01
4,
0.00
11.
751
�1.
111
2.6
87
NA
0.61
70.
032
0.73
9N
AA
t4g1
4190
0.0
69
***
0.00
10.
008
0.00
21.
803
�1.
576
�0.
199
�0.
577
0.42
3�
0.06
80.
521
0.06
7P
PR
aver
age
0.0
63
*0.
003
0.0
12
0.0
03
1.8
21
***
�0.
708
1.7
27
�0.
341
0.5
17
0.08
60.
616
0.25
6P
erm
ute
dm
ean
0.01
90.
006
0.00
50.
002
0.44
2�
0.44
4�
0.08
2�
0.76
20.
399
0.35
50.
488
0.00
1
Est
imat
eso
fsy
no
nym
ou
san
dn
on
syn
on
ymo
us
nu
cleo
tid
ed
iver
sity
,fre
qu
ency
of
vari
ants
and
pai
rwis
ep
op
ula
tio
nd
iffe
ren
tiat
ion
atP
PR
gen
efr
agm
ents
,an
dge
no
me
aver
-ag
esin
A.
lyra
taan
dA
.th
alia
na.
Stat
isti
call
ysi
gnifi
can
tva
lues
:p
,0.
05u
sin
gp
erm
uta
tio
nte
sts
isin
dic
ated
by
bo
ldfa
cety
pe;
***,
p,
0.00
1u
sin
gco
ales
cen
tsi
mu
lati
on
sco
nd
uct
edo
nly
for
syn
on
ymo
us
site
sin
A.
lyra
ta;
*,p
,0.
05u
sin
gco
ales
cen
tsi
mu
lati
on
sco
nd
uct
edo
nly
for
syn
on
ymo
us
site
sin
A.
lyra
ta.
aSy
no
nym
ou
san
dn
on
syn
on
ymo
us
nu
cleo
tid
ed
iver
sity
asm
easu
red
by
psy
nan
dp
no
nsy
n,
wh
ere
pis
the
aver
age
nu
mb
ero
fp
airw
ise
dif
fere
nce
sb
etw
een
two
ind
ivid
ual
s.bF
req
uen
cyo
fva
rian
tsin
each
of
the
PP
Rge
nes
and
gen
om
eav
erag
esin
A.
lyra
taan
dA
.th
alia
na
asm
easu
red
by
calc
ula
tin
gT
ajim
a’s
Dsy
no
nym
ou
san
dn
on
syn
on
ymo
us.
cSy
no
nym
ou
san
dn
on
syn
on
ymo
us
pai
rwis
ep
op
ula
tio
nd
iffe
ren
tiat
ion
esti
mat
esas
mea
sure
db
yth
ep
op
ula
tio
nd
iffe
ren
tiat
ion
par
amet
erF S
Tca
lcu
late
du
sin
gp
syn
and
pn
on
syn.
666 J. P. Foxe and S. I. WrightD
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
vated at PPR genes in comparison with the genomicdistribution. The mean value of both psyn and pnonsyn atPPR genes showed a significant elevation compared to apermuted mean from seven random loci from the entirepolymorphism data set (P , 0.001). Individually, six ofthe seven PPR genes showed significantly elevated psyn
values when compared to the genomewide distribution,including all four loci that share sequence homologywith rice fertility restorers (At1g03560, At1g74600,At2g28050, and At4g14190). In contrast, there was nosignificant increase in the levels of synonymous diversityat PPR genes in A. thaliana. However, pnonsyn at PPRgenes showed a significant elevation compared to apermuted mean from seven random loci in the A.thaliana reference polymorphism data set (P , 0.05).
Synonymous and nonsynonymous nucleotide diver-sity for the 7 PPR gene fragments and the genomewidepatterns for A. lyrata were also calculated for each of thesix A. lyrata populations used in this study (Table S3).The mean value of both psyn and pnonsyn at PPR genesshowed a significant elevation compared to the genomeaverage in Iceland, Germany, and Sweden. pnonsyn atPPR genes also showed a significant elevation comparedto the genome average in Russia. When examiningindividual genes, 12 of 42 (29%) gene-populationcombinations showed elevated synonymous diversity,while none showed signicantly reduced diversity withinpopulations (Table S3). One caveat with this latter resultis that low within-population diversity is common in thereference genes (Ross-Ibarra et al. 2008), providinglittle power to detect unusually reduced polymorphism.In addition, 11 of 42 (26%) gene-population compar-isons showed elevated nonsynonyous diversity (Table S3).
We investigated the site frequency spectrum in eachof the PPR genes by calculating Tajima’s D at synony-mous and nonsynonymous sites (listed in Table 1). Boththe mean synonymous and nonsynonymous Tajima’s Dfor PPR genes were significantly elevated compared withthe permuted mean of the entire multilocus data set inA. lyrata (P , 0.001). In addition, At1g03560 andAt1g74600 showed individually significant Tajima’s Dvalues in comparison with reference genes (Table 1).Within populations, Tajima’s D was less often significant(Table S3); only 2 of 42 (5%) gene-population compar-isons were significant at synonymous sites, and the meanTajima’s D was not significantly elevated within individ-ual populations. However, the average nonsynonymousTajima’s D was significantly elevated in Germany. Noelevation of Tajima’s D was observed in A. thaliana.
Levels of population differentiation at synonymousand nonsynonymous sites, as measured using FST, areshown in Table 1. Although no individual locus showedelevated FSTsyn
compared with the empirical null distri-bution, the mean FSTsyn
for PPR genes in A. lyrata wassignificantly elevated compared with the genome aver-age (Table 1; P , 0.05). Although FSTnonsyn
was notsignificantly elevated, the trend suggested a mean value
higher than the permuted mean (Table 1). Neither FSTsyn
nor FSTnonsynwere significantly elevated for PPR genes in
A. thaliana, and, as with A. lyrata, the trend suggesteda mean value higher than the permuted mean (Table1). Three loci, however, At1g03560, At2g28050, andAt2g36980, individually displayed significantly elevatedFSTnonsyn
values in A. thaliana.FSTsyn for both PPR genes and the genome average was
calculated for each pair of populations in A. lyrata. Ofthe 15 possible population combinations, the averageFSTsyn in PPR genes was significantly elevated above thatof the genome average in Canada–Russia, Germany–Canada, Germany–United States, and Russia–UnitedStates while FSTnonsyn
in PPR genes was significantlyelevated in Canada–Russia, Germany–Canada, Ger-many–United States, and Sweden–United States (Figure1). Thus, in both cases, �30% of FST’s are significantlyelevated, although it should be emphasized that these
Figure 1.—Pairwise population differentiation at synony-mous and nonsynonymous sites. The population differentia-tion parameters FSTsyn
(A) and FSTnonsyn(B) for both PPR genes
and the genome average were calculated for each pair ofpopulations in A. lyrata. FST values were calculated using es-timates of the average pairwise synonymous differences, psyn
(A), and estimates of the average pairwise nonsynonymousdifferences, pnonsyn (B), where p is the average number ofpairwise differences between two individuals. Statistical sig-nificance as estimated using permutation tests is indicatedby an asterisk.
Pentatricopeptide Repeat Protein Family 667D
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
tests are not independent, given the recurrent use ofindividual populations.
Estimates of diversity calculated while controllingfor divergence: While our initial analysis suggestedelevated diversity at PPR genes in A. lyrata, it is possiblethat this increase in diversity is due either to the actionof balancing selection maintaining polymorphism atthese loci or to an increased mutation rate. While theelevated Tajima’s D and population differentiationsuggest diversifying selection, we wanted to investigatefurther the possibility of an elevated mutation rate,particularly since the variance in FST and Tajima’s D canbe influenced by locus-specific heterozygosity (Tajima
1989; Beaumont and Nichols 1996), which couldpotentially influence our statistical comparisons ofPPR genes with reference genes.
If diversity was elevated in PPR genes due to a highermutation rate, we would expect this to be reflected inhigh divergence at these genes. As shown in Figure 2,diversity still appears elevated in PPR genes compared tothe genome average in A. lyrata when compared withloci having similar divergence. To explicitly test forelevated diversity at PPR genes controlling for diver-gence, we employed a mlHKA test (Wright andCharlesworth 2004). The mlHKA test is based onthe HKA test, which uses multilocus nucleotide data totest for neutral evolution by comparing within-speciesdiversity and between-species divergence (Hudson et al.1987). Under this framework, a model allowing selectionon PPR genes as a class shows a significant improvementover a model that assumes that all genes are neutral(Table 2). Estimates of the parameter k, the degree towhich diversity is decreased or increased relative toneutral expectation, shows values generally .1, indicat-ing a general elevation of diversity relative to divergence(Table 2).
Figure 2.—Estimates of synonymous diversity (psyn, the av-erage pairwise differences) vs. per-site synonymous diver-gence (Ks) in A. lyrata.
TA
BL
E2
Max
imu
m-l
ikel
iho
od
anal
ysis
of
sile
nt
po
lym
orp
his
min
A.
lyra
ta
kva
lueb
Mo
del
lnL
aL
ikel
iho
od
-rat
iost
atis
tic(
d.f
.)P
valu
eA
T1G
0356
0A
T1G
5972
0A
T1G
7460
0A
T2G
2805
0A
T2G
3698
0A
T3G
6289
0A
T4G
1419
0
Neu
tral
(all
k¼
1)�
302.
941
11
11
11
Sele
ctio
no
n7
PP
Rge
nes�
294.
2117
.462
(6)
,0.
011.
3010
93.
1274
21.
7162
61.
6607
60.
8328
181.
1270
51.
8997
9
Th
em
od
elw
asru
nu
nd
ertw
om
od
els:
an
eutr
alm
od
elan
da
sele
ctio
nm
od
elin
wh
ich
the
PP
Rge
nes
are
the
targ
eto
fse
lect
ion
.a
lnL
isth
elo
g-li
keli
ho
od
valu
efo
rea
chm
od
el.
bk
valu
ees
tim
ates
of
each
of
the
PP
Rge
nes
inA
.ly
rata
asca
lcu
late
du
sin
gm
lHK
A,w
her
ek
isa
mea
sure
of
the
deg
ree
tow
hic
hd
iver
sity
issi
gnifi
can
tly
elev
ated
or
red
uce
din
com
par
iso
nw
ith
neu
tral
ity.
668 J. P. Foxe and S. I. WrightD
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
Although mlHKA controls for differences across loci inmutation rate, it does not account for inflated variancein diversity due to demographic history. In particular,previous work has inferred that A. lyrata populationshave experienced recent severe population bottlenecks,which may contribute to excess polymorphism at asubset of loci. To explicitly control for this, we madeuse of demographic inferences from the approximateBayesian compuation implemented for these popula-tions by Ross-Ibarra et al. (2008). We simulated theinferred population bottlenecks and subsequent re-covery in six independent populations, all derived froma single ancestral population, and accounted for differ-ences in population mutation rate by scaling each locusby synonymous divergence. Consistent with our otherapproaches, the observed ratio of synonymous diversityat PPR genes relative to the total data set is significantlyelevated compared with simulated data sets, and six ofseven individual loci show evidence for significantlyelevated polymorphism (Table 1, Figure S1). In addi-tion, the average Tajima’s D and the Tajima’s D value atAt1g03560 and At1g74600 show significant elevation(Table 1, Figure S2).
Diversity estimates at PPR genes and their flankingregions: According to our hypothesis, PPR genes shouldbe the direct target of selection, and diversity levelsshould be lower in genes surrounding them. To test this,we analyzed diversity in the regions surrounding each ofthe seven PPR genes in A. lyrata. In each case, diversitywas calculated in 100-bp overlapping windows. Figure 3shows psyn at each of the seven PPR gene fragments usedin this study as well as psyn at seven fragments withinthese PPR loci and 20 fragments adjacent to the PPRgenes. For a number of loci, particularly At1g03560,At1g59720, At1g74600, and At2g28050, psyn can be seento peak at the PPR gene fragment and decay withdistance from the PPR gene.
DISCUSSION
A number of PPR loci surveyed were found to displayelevated levels of polymorphism, excess high-frequencyvariants, and in some cases elevated among-populationdifferentiation in comparison with neutral loci in A.lyrata. Taken together, these results suggest the action ofdiversifying selection on some PPR genes in this species.Given the putative function of these loci as regulators ofcytoplasmic genes and the sequence similarity of someof them to fertility restorers in CMS, our results suggestthat there may be selection associated with ongoingcoevolution between cytoplasmic and nuclear genes.The lack of comparable evidence for selection on theseloci in the highly selfing A. thaliana is at least consistentwith the hypothesis that these unusual diversity patternsare reflective of cytonuclear conflict in the outcrossingA. lyrata. The one exception is evidence for a slight but
significant excess of nonsynonymous polymorphism inA. thaliana; given this pattern in the absence of any otherevidence for selection, this could reflect a relaxation ofselective constraint on this class of proteins relative tothe other genes sequenced.
Figure 3.—Silent diversity estimates (psyn, the average pair-wise differences) at each of the seven PPR genes and theirflanking regions in A. lyrata (A–G). PPR loci (including thoseflanking genes deemed to be PPR loci) are labeled in A–G.Solid lines indicate the position of the PPR gene coding se-quence. Open boxes represent the positions of the PPR mo-tifs within each PPR gene. Horizontal dashed lines representthe A. lyrata genome average value of psyn. Vertical hatchedlines indicate gaps in physical location for which sequencedata were not obtained.
Pentatricopeptide Repeat Protein Family 669D
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
A dominant challenge in molecular population ge-netic studies is distinguishing the role of locus-specificselection from other factors that can create a highvariance in diversity, such as population history andvariation in mutation rates (Wright and Gaut 2005). Inthis study, we used four approaches to test for selectionon PPR genes in these populations while controlling forother potential factors influencing diversity. First, ournonparametric approach made use of the multilocusdata set to generate an empirical null distributionagainst which to test candidate genes. This approachindicated that the average diversity, Tajima’s D, andFST values for our PPR genes show significant elevationcompared with the reference loci and that six of ourseven PPR genes show individual diversity levels in excessrelative to the empirical null distribution. Since de-mographic factors are not expected to act specifically ona particular class of gene, this approach should be robustto the historical factors inflating the variance in diversityacross loci. Second, our analysis of flanking regionsgenerally provides supporting evidence that the elevateddiversity is particularly localized at the PPR genesthemselves. Third, mlHKA suggests that mutation ratedifferences alone cannot account for the excess di-versity. Finally, our simulation-based approach thatcontrols for both variation in mutation rate across lociand the inferred demographic history supports ourconclusions that this class of gene shows unusuallyelevated diversity.
An advantage of the fourth approach applying co-alescent simulations is that both demographic historyand heterogeneity in mutation rates are controlled forin the same analysis. Although the simulations per-formed by Ross-Ibarra et al. (2008) suggested that thedemographic model alone provided a good fit to boththe mean and the variance of diversity statistics, thesesimulations allowed for locus-specific mutation ratesinferred only from diversity patterns. The previousanalysis was thus not constrained in plausible mutationrates by divergence between species. In other words, theprevious simulation approach allowed all differences inlevels of diversity across loci to be explained by differ-ences in mutation rate. Our results for PPR genes heresuggest that the demographic model alone may notexplain the entire variance in diversity statistics, oncemutation rates are scaled by divergence. As multilocusdata sets accumulate for this species, it will be in-teresting to re-explore the extent to which heterogene-ity in diversity patterns in A. lyrata can be explained bydemographic history alone, or whether hitchhiking iscommon across the genome.
In this study, we have taken the approach of studyingPPR genes as a class, since neutral demographic pro-cesses are not expected to affect a particular set of genesdifferentially, which may allow for increased power todetect recurring selection acting on a gene family. Sincethere can be low power to detect many types of selection
(Thornton and Jensen 2007), examining a set of genessubject to shared selective history should enhance theability to detect selection (Bakker et al. 2006). Un-surprisingly, however, the signature of balancing selec-tion is not consistent across all of our PPR genes.At2g36980, for example, does not show any patternssuggesting elevated diversity. Furthermore, only twoindividual loci show significantly elevated Tajima’s Dvalues, although an additional three show values con-siderably higher than the multilocus average. Finally,several PPR genes flanking our target loci do not showprominent signatures of selection (Figure 3). PPR genes,many of which show severe knockout phenotypes, arethought to play a role in post-transcriptional processesthrough an RNA-binding mechanism and have been im-plicated in a variety of essential functions. Thus many ofthese loci are likely to be subject to neutral patterns ofdiversity and molecular evolution and may not be subjectto coevolutionary interactions with their cytoplasmictargets.
High nucleotide diversity in range-wide populationsamples can be indicative of two types of selectiveprocesses, collectively called balancing selection by mole-cular population geneticists (Charlesworth 2006).First, frequency-dependent selection and heterozygoteadvantage can selectively maintain high polymorphismwithin populations and may (although not necessarily)also distort the site frequency spectrum toward anexcess of high-frequency variants (Charlesworth
2006). Second, local adaptation and spatially varyingselection across populations can also act to maintainhigh levels of species-wide diversity, giving a signature ofbalancing selection (Charlesworth 2006). Given theevidence for a general excess of both diversity anddifferentiation, spatially varying selection may predom-inate. On the other hand, we did observe high within-population polymorphism at several PPR loci, andsingle-locus tests reveal little sign of high differentiation(Table S3). Nevetheless, simulation results indicate thatlocal selection coupled with migration can also lead toexcess within-population polymorphism (Charles-
worth et al. 1997), and the power to detect localadaptation from single-locus tests may be low in ahighly structured species such as A. lyrata (Ross-Ibarra
et al. 2008).One possibility for a mode of selection on some PPR
genes is local selection for regulation of population-specific cytoplasmic alleles. Given the evidence forelevated between-population differentiation, particu-larly between European and North American popula-tion pairs, it is possible that distinct cytoplasmic variantsarise in individual regions, contributing to regionaldirectional selection on PPR alleles. An alternativepossibility is frequency-dependent selection; increasesin the frequency of cytoplasmic mutants that reducemale fertility, either quantitatively or qualitatively, couldselect for rare PPR alleles. As the suppressor increases in
670 J. P. Foxe and S. I. WrightD
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
frequency, this favors rare cytoplasmic variants, and thusthe rare restorer alleles, and could act to maintainvariation over long evolutionary timescales.
Our primary hypothesis is that PPR genes are underselection mediated by cytonuclear conflict. However,an alternative is that there is selection for compensa-tory nuclear mutations in response to the fixation ofslightly deleterious cytoplasmic mutations. Cytoplasmicgenomes typically have reduced effective sizes relativeto the nuclear genome, and we have recently shownevidence supporting the expected fourfold reductionin effective size in cytoplasmic genes relative to nucleargenes in A. lyrata (Wright et al. 2008a). In animals,patterns suggesting higher amino acid fixation rates inthe mitochondrial genomes of species with smallpopulation sizes are consistent with the existence of asignificant class of mildly deleterious amino acid muta-tions (Popadin et al. 2007). Postglacial bottlenecks in A.lyrata (Ross-Ibarra et al. 2008) might drive the fixationof slightly deleterious cytoplasmic mutations, leading tothe selective fixation of compensatory nuclear alleles.It has recently been hypothesized that the tremen-dous expansion of PPR genes in plants may in partreflect compensatory evolution for the fixation ofdeleterious cytoplasmic amino acid mutations (Schmitz-Linneweber and Small 2008). If the local fixation ofdeleterious amino acid mutations in cytoplasmic geneshas occurred in A. lyrata, local directional selectionmay be acting on PPR genes to silence these changes.Recent evidence that one of our target loci, At1g59720,functions in RNA editing at multiple sites in the chloro-plast (Okuda et al. 2009) is consistent with this. Finally,changes in PPR genes could mediate adaptive cytoplas-mic mutations without any corresponding changes incytoplasmic genes; given the low mutation rate in bothplant mitochondria and chloroplasts, PPR modifica-tions of cytoplasmic proteins may be an importantengine of adaptive evolution, irrespective of coevolu-tionary interactions.
If local adaptation and balancing selection on cyto-nuclear interactions are in fact prevalent in A. lyrata, wewould expect to commonly expose cytonuclear fitnesseffects during reciprocal crossing experiments, particu-larly between populations. Our observed excess ofdifferentiation at PPR genes between European andNorth American populations suggests that crossesbetween regions may provide an opportunity to unmaskcytonuclear effects on male fitness in this species. If suchpatterns are indeed uncovered, these highly polymor-phic PPR loci represent important candidate genescontributing to cytonuclear epistasis in fitness.
Clearly, PPR genes have been shown to have diversefunctions, and genetic studies and direct functionalcharacterization of more of these loci are clearlyessential. Nevertheless, our population genetic dataprovide preliminary evidence that cytonuclear conflictmay be prevalent in outcrossing hermaphrodites and
may play an important role in the structuring ofgenomes and genetic variation. This complementsother recent genetic and population-level studies ofCMS variation in natural populations of Mimulus, whichsuggest the prevalence of cytonuclear conflicts in out-crossing plant populations (Fishman and Willis 2006;Case and Willis 2008).
We thank B. S. Gaut for discussion and comments on the man-uscript. We thank J. Ross-Ibarra for assistance and advice with thepairwise population differentiation estimates. We thank R. Gaut forproviding extracted DNA for 48 A. thaliana individuals. This work wassupported by a Natural Sciences and Engineering Research Council ofCanada Discovery Grant, an Early Researcher Award from the OntarioMinistry of Research and Innovation, and an Alfred P. SloanFoundation Fellowship (S.I.W.).
LITERATURE CITED
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang
et al., 1997 Gapped BLASTand PSI-BLAST: a new generation ofprotein database search programs. Nucleic Acids Res. 25: 3389–3402.
Arabidopsis Genome Initiative, 2000 Analysis of the genome se-quence of the flowering plant Arabidopsis thaliana. Nature 408:796–815.
Bachtrog, D., and P. Andolfatto, 2006 Selection, recombinationand demographic history in Drosophila miranda. Genetics 174:2045–2059.
Bakker, E. G., C. Toomajian, M. Kreitman and J. Bergelson,2006 A genome-wide survey of R gene polymorphisms in Arabi-dopsis. Plant Cell 18: 1803–1818.
Beaumont, M. A., and R. A. Nichols, 1996 Evaluating loci for usein the genetic analysis of population structure. Proc. R. Soc.Lond. B. 263: 1619–1626.
Bentolila, S., A. A. Alfonso and M. R. Hanson, 2002 A penta-tricopeptide repeat-containing gene restores fertility to cyto-plasmic male-sterile plants. Proc. Natl. Acad. Sci. USA 99:10887–10892.
Brown, G. G., N. Formanova, H. Jin, R. Wargachuk, C. Dendy et al.,2003 The radish Rfo restorer gene of Ogura cytoplasmic malesterility encodes a protein with multiple pentatricopeptide re-peats. Plant J. 35: 262–272.
Budar, F., P. Touzet and R. De Paepe, 2003 The nucleo-mitochon-drial conflict in cytoplasmic male sterilities revisited. Genetica117: 3–16.
Burt, A., and R. Trivers, 2006 Genes in Conflict: The Biology of SelfishGenetic Elements. The Belknap Press of Harvard University Press,Cambridge, MA.
Case, A. L., and J. H. Willis, 2008 Hybrid male sterility in Mimulus(Phrymaceae) is associated with a geographically restricted mito-chondrial rearrangement. Evolution 62: 1026–1039.
Charlesworth, B., M. Nordborg and D. Charlesworth, 1997 Theeffects of local selection, balanced polymorphism and backgroundselection on equilibrium patterns of genetic diversity in subdividedpopulations. Genet. Res. 70: 155–174.
Charlesworth, D., 2006 Balancing selection and its effects on se-quences in nearby genome regions. PLoS Genet. 2: e64.
Charlesworth, D., and V. Laporte, 1998 The male-sterility poly-morphism of Silene vulgaris: analysis of genetic data from two pop-ulations and comparison with Thymus vulgaris. Genetics 150:1267–1282.
Desloire, S., H. Gherbi, W. Laloui, S. Marhadour, V. Clouet et al.,2003 Identification of the fertility restoration locus, Rfo, in rad-ish, as a member of the pentatricopeptide-repeat protein family.EMBO Rep. 4: 588–594.
Fishman, L., and J. H. Willis, 2006 A cytonuclear incompatibilitycauses anther sterility in Mimulus hybrids. Evolution Int. J.Org. Evolution 60: 1372–1381.
Frank, S. A., 1989 The evolutionary dynamics of cytoplasmic malesterility. Am. Nat. 133: 345–376.
Pentatricopeptide Repeat Protein Family 671D
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
Geddy, R., and G. G. Brown, 2007 Genes encoding pentatricopep-tide repeat (PPR) proteins are not conserved in location in plantgenomes and may be subject to diversifying selection. BMC Ge-nomics 8: 130.
Gouyon, P. H., and D. Couvet, 1987 A conflict between two sexes,females and hermaphrodites. Experientia Suppl. 55: 245–261.
Hanson, M. R., 1991 Plant mitochondrial mutations and male ste-rility. Annu. Rev. Genet. 25: 461–486.
Hansson, B., A. Kawabe, S. Preuss, H. Kuittinen and D.Charlesworth, 2006 Comparative gene mapping in Arabi-dopsis lyrata chromosomes 1 and 2 and the corresponding A.thaliana chromosome 1: recombination rates, rearrangementsand centromere location. Genet. Res. 87: 75–85.
Hudson, R. R., 2002 Generating samples under a Wright-Fisherneutral model of genetic variation. Bioinformatics 18: 337–338.
Hudson, R. R., M. Kreitman and M. Aguade, 1987 A test of neutralmolecular evolution based on nucleotide data. Genetics 116:153–159.
Hudson, R. R., M. Slatkin and W. P. Maddison, 1992 Estimation oflevels of gene flow from DNA sequence data. Genetics 132: 583–589.
Kazama, T., and K. Toriyama, 2003 A pentatricopeptide repeat-containing gene that promotes the processing of aberrantatp6 RNA of cytoplasmic male-sterile rice. FEBS Lett. 544: 99–102.
Kimura, M., 1983 The Neutral Theory of Molecular Evolution. Cam-bridge University Press, Cambridge, UK.
Koizuka, N., R. Imai, H. Fujimoto, T. Hayakawa, Y. Kimura et al.,2003 Genetic characterization of a pentatricopeptide repeatprotein gene, orf687, that restores fertility in the cytoplasmicmale-sterile Kosena radish. Plant J. 34: 407–415.
Komori, T., S. Ohta, N. Murai, Y. Takakura, Y. Kuraya et al.,2004 Map-based cloning of a fertility restorer gene, Rf-1, in rice(Oryza sativa L.). Plant J. 37: 315–325.
Kotera, E., M. Tasaka and T. Shikanai, 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplasts. Na-ture 433: 326–330.
Lurin, C., C. Andres, S. Aubourg, M. Bellaoui, F. Bitton et al.,2004 Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle biogenesis.Plant Cell 16: 2089–2103.
Nakamura, T., G. Schuster, M. Sugiura and M. Sugita,2004 Chloroplast RNA-binding and pentatricopeptide repeatproteins. Biochem. Soc. Trans. 32: 571–574.
Nordborg, M., T. T. Hu, Y. Ishino, J. Jhaveri, C. Toomajian et al.,2005 The pattern of polymorphism in Arabidopsis thaliana.PLoS Biol. 3: e196.
Okuda, K., A. L. Chateigner-Boutin, T. Nakamura, E. Delannoy,M. Sugita et al., 2009 Pentatricopeptide repeat proteins withthe DYW motif have distinct molecular functions in RNA editingand RNA cleavage in Arabidopsis chloroplasts. Plant Cell 21: 146–156.
O’Toole, N., M. Hattori, C. Andres, K. Iida, C. Lurin et al.,2008 On the expansion of the pentatricopeptide repeat genefamily in plants. Mol. Biol. Evol. 25: 1120–1128.
Popadin, K., L. V. Polishchuk, L. Mamirova, D. Knorre and K.Gunbin, 2007 Accumulation of slightly deleterious mutationsin mitochondrial protein-coding genes of large versus smallmammals. Proc. Natl. Acad. Sci. USA 104: 13390–13395.
Rand, D. M., R. A. Haney and A. J. Fry, 2004 Cytonuclear coevolu-tion: the genomics of cooperation. Trends Ecol. Evol. 19: 645–653.
Ross-Ibarra, J., S. I. Wright, J. P. Foxe, A. Kawabe, L. DeRose-Wilson et al., 2008 Patterns of polymorphism and demo-graphic history in natural populations of Arabidopsis lyrata. PloSOne 3: e2411.
Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer and R. Rozas,2003 DnaSP, DNA polymorphism analyses by the coalescentand other methods. Bioinformatics 19: 2496–2497.
Schmitz-Linneweber, C., and I. Small, 2008 Pentatricopeptide re-peat proteins: a socket set for organelle gene expression. TrendsPlant Sci. 13: 663–670.
Schnable, P. S., and R. P. Wise, 1998 The molecular basis of cyto-plasmic male sterility and fertility restoration. Trends Plant Sci. 3:175–180.
Small, I. D., and N. Peeters, 2000 The PPR motif: a TPR-relatedmotif prevalent in plant organellar proteins. Trends Biochem.Sci. 25: 46–47.
Tajima, F., 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism. Genetics 123: 585–595.
Thornton, K., 2003 Libsequence: a C11 class library for evolution-ary genetic analysis. Bioinformatics 19: 2325–2327.
Thornton, K. R., and J. D. Jensen, 2007 Controlling the false-pos-itive rate in multilocus genome scans for selection. Genetics 175:737–750.
Wang, Z., Y. Zou, X. Li, Q. Zhang, L. Chen et al., 2006 Cytoplasmicmale sterility of rice with boro II cytoplasm is caused by a cyto-toxic peptide and is restored by two related PPR motif genesvia distinct modes of mRNA silencing. Plant Cell 18: 676–687.
Wright, S., 1969 The Theory of Gene Frequencies. University of Chica-go Press, Chicago.
Wright, S. I., and B. Charlesworth, 2004 The HKA test revisited:a maximum-likelihood-ratio test of the standard neutral model.Genetics 168: 1071–1076.
Wright, S. I., and B. S. Gaut, 2005 Molecular population geneticsand the search for adaptive evolution in plants. Mol. Biol. Evol.22: 506–519.
Wright, S. I., N. Nano, J. P. Foxe and V. U. Dar, 2008a Effectivepopulation size and tests of neutrality at cytoplasmic genes in Ara-bidopsis. Genet. Res. 90: 119–128.
Wright, S. I., R. W. Ness, J. P. Foxe and S. C. H. Barrett,2008b Genomic consequences of outcrossing and selfing inplants. Int. J. Plant Sci. 169: 105–118.
Communicating editor: O. Savolainen
672 J. P. Foxe and S. I. WrightD
ownloaded from
https://academic.oup.com
/genetics/article/183/2/663/6063014 by guest on 01 March 2022
Supporting Information http://www.genetics.org/cgi/content/full/genetics.109.104778/DC1
Signature of Diversifying Selection on Members of the Pentatricopeptide Repeat Protein Family in Arabidopsis lyrata
John Paul Foxe and Stephen I. Wright
Copyright © 2009 by the Genetics Society of America DOI: 10.1534/genetics.109.104778
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
J. P. Foxe and S. I. Wright 2 SI
FIGURE S1.—πsyn from 10,000 coalescent simulations under the best demographic model (see text). The first panel shows the mean for the PPR gene family and the remaining panels show values for each individual PPR locus. Observed estimates are indicated by arrows.
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
J. P. Foxe and S. I. Wright 3 SI
FIGURE S2.—Tajima’s Dsyn from 10,000 coalescent simulations under the best demographic model (see text). The first panel shows the mean for the PPR gene family and the remaining panels show values for each individual PPR locus. Observed estimates are indicated by arrows.
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
J. P. Foxe and S. I. Wright 4 SI
TABLE S1
List of gene fragments sequenced and their sample size in A. lyrata in this study. PPR loci are given in bold.
Locus Sample Size
1g03560-1053744* 122
1g03560-1054749 24
1g03590-1055816 20
1g03590-1056303 20
1g03590-1056868 22
1g59710-3772518 24
1g59710-3773200 24
1g59720-3770925* 82
1g59740-3722719 74
1g74580-16338131 98
1g74580-16339064 94
1g74600-16342481* 112
1g74630-16350229 106
1g74640-16352834 82
2g28040-12173691 18
2g28050-12170614 20
2g28050-12171388* 126
2g28050-12171510 24
2g36970-17628199 24
2g36970-17629020 16
2g36980-17630803 20
2g36980-17631328* 118
2g36980-17631688 18
3g62890-20949853 14
3g62890-20950758 20
3g62890-20951026* 116
4g14170-14309884 70
4g14180-14302891 22
4g14180-14304679 22
4g14180-14304802 14
4g14180-14307137 6
4g14190-14301512* 110
4g14280-14214623 108
4g14280-14215399 88
* fragments surveyed in ROSS-IBARRA et al, 2008
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
J. P. Foxe and S. I. Wright 5 SI
TABLE S2
List of A. thaliana accessions used in this study as well as the number of individuals used in this study and
their region of origin.
Accession Number of individuals used Region of origin
Col-0 1 unknown
RRS-7 2 U.S. Midwest (Indiana)
RRS-10 2 U.S. Midwest (Indiana)
KNO-10 2 U.S. Midwest (Indiana)
KNO-18 2 U.S. Midwest (Indiana)
RMX-A02 2 U.S. Midwest (Michigan)
RMX-A180 2 U.S. Midwest (Michigan)
PNA-17 2 U.S. Midwest (Michigan)
PNA-10 2 U.S. Midwest (Michigan)
Eden-1 2 North Sweden
Eden-2 2 North Sweden
Lov-1 2 North Sweden
Lov-5 2 North Sweden
Fab-2 2 North Sweden
Fab-4 2 North Sweden
Bil-5 2 North Sweden
Bil-7 2 North Sweden
Var-2-1 2 South Sweden
Var-2-6 2 South Sweden
Spr-1-2 2 South Sweden
Spr-1-6 2 South Sweden
Omo-2-1 2 South Sweden
Omo-2-3 2 South Sweden
Ull-2-5 2 South Sweden
Ull-2-3 2 South Sweden
Zdr-1 2 Central Europe (Czech Republic)
Zdr-6 2 Central Europe (Czech Republic)
Bor-1 2 Central Europe (Czech Republic)
Bor-4 2 Central Europe (Czech Republic)
Pu2-7 2 Croatia
Pu2-23 2 Croatia
LP2-2 2 Central Europe (Czech Republic)
LP2-6 2 Central Europe (Czech Republic)
HR-5 2 England
HR-10 2 England
NFA-8 2 England
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
J. P. Foxe and S. I. Wright 6 SI
NFA-10 2 England
SQ-1 2 England
SQ-8 2 England
CIBC-5 2 England
CIBC-17 2 England
TAMM-2 2 Finland
TAMM-27 2 Finland
KZ-1 2 Kazakhstan
KZ-9 2 Kazakhstan
GOT-7 2 Germany
GOT-22 2 Germany
REN-1 1 France
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
J. P. Foxe and S. I. Wright 7 SI
TABLE S3
Levels of Diversity and Differentiation at PPR genes and genome averages in A. lyrata for each population
used in this study.
ICELAND πSyna πNonSyn TajDSynb TajDNonSyn FstSync FstNonSyn
At1g03560 0.033775851 0.005474173 -0.739899364 -0.564630455 0.405816169 0.409731152
At1g59720 0.074805928 0.01504378 0.75202693 0.485299508 0.123630212 0.319751477
At1g74600 0.004229267 0 1.430241391 0 0.953055288 1
At2g28050 0.017747235 0.004551121 0.827003785 1.457218245 0.715840773 0.64641257
At2g36980 0.016461376 0.006844992 0.958335646 1.628429046 0.145273562 0.174847058
At3g62890 0.038829992 0.00847152 2.301453559 2.114134733 0.317772503 0.396555055
At4g14190 0.016904698 0.001912616 1.855417475 -0.549509032 0.753883062 0.751469994
PPR average 0.028964907 0.0060426 1.054939917 0.652991721 0.487895938 0.528395329
permuted mean 0.012334615 0.002179123 0.472634204 0.162742991 0.504968684 0.270315738
GERMANY πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn
At1g03560 0.049687564 0.008366133 0.8494825 0.099031862 0.125897762 0.097897028
At1g59720 0.087298403 0.014764863 0.171142575 0.140275284 -0.022722193 0.332363516
At1g74600 0.094175947 0.012471549 2.942282427 2.169098439 -0.045349598 -0.050215477
At2g28050 0.068183156 0.017634871 0.924537954 0.547322928 -0.091712212 -0.370095153
At2g36980 0.015008338 0.007870927 0.283235749 1.165949245 0.220719884 0.051172265
At3g62890 0.014821732 0.007586172 -0.300815438 1.132377452 0.739588077 0.459620329
At4g14190 0.068618236 0.006499866 1.088525427 0.443364193 0.000981252 0.155391567
PPR average 0.056827625 0.010742055 0.851198742 0.813917058 0.132486139 0.096590582
permuted mean 0.021856224 0.004113937 0.330829138 0.167145305 -0.369993163 -0.148292811
CANADA πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn
At1g03560 0.001025392 0.00059795 -1.164672189 -1.507756012 0.981961334 0.935524317
At1g59720 0.061203555 0.013007877 0.752859064 0.956875828 0.28298535 0.411810758
At1g74600 0 0.00073586 0 0.138693107 1 0.938034075
At2g28050 0 0 0 0 1 1
At2g36980 0 0 0 0 1 1
At3g62890 0.003580035 0 0.593481941 0 0.937100206 1
At4g14190 0.01897986 0.002465489 -2.320883623 -1.327164848 0.723670602 0.679628402
PPR average 0.012112692 0.002401025 -0.305602115 -0.248478846 0.84653107 0.852142507
permuted mean 0.005373627 0.001248186 0.201420545 0.144104985 0.716856115 0.731617506
US πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn
At1g03560 0.001747835 0.001048113 -0.591550125 -1.440706444 0.969252128 0.886984162
At1g59720 0.00870042 0.001574731 -0.706252211 -1.001802986 0.898072442 0.928793919
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022
J. P. Foxe and S. I. Wright 8 SI
At1g74600 0 0 0 0 1 1
At2g28050 0 0 0 0 1 1
At2g36980 0.006919908 0.003609256 -0.948945187 -1.077765584 0.640696586 0.564909874
At3g62890 0 0.001140876 0 0.649980502 1 0.918732918
At4g14190 0.02055604 0.002184146 -1.024325077 0.906187075 0.700722867 0.716186774
PPR average 0.005417743 0.001365303 -0.467296086 -0.280586777 0.886963432 0.859372521
permuted mean 0.005555557 0.001277209 0.359703258 0.182018719 0.711436741 0.713918109
SWEDEN πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn
At1g03560 0.033678523 0.005725796 -0.739899364 -0.427556872 0.40752836 0.382599204
At1g59720 0.031314541 0.002848818 1.584413719 1.641452667 0.633142475 0.87118234
At1g74600 0.002721451 0.00254079 0.155745965 1.687095109 0.969791988 0.78604289
At2g28050 0.029933722 0.008722971 0.344450284 1.167481188 0.520717254 0.322291604
At2g36980 0.007346207 0.000798243 -0.237849822 0.021929652 0.618561844 0.903773037
At3g62890 0.007264957 0.001429532 -2.003179165 -1.83087563 0.872357595 0.898171271
At4g14190 0.0744422 0.006294607 2.11928621 0.930998555 -0.083810329 0.182063413
PPR average 0.026671657 0.004051537 0.17470969 0.455789239 0.562612741 0.620874823
permuted mean 0.009068391 0.001655283 0.318069453 0.303681742 0.537710295 0.569044586
RUSSIA πSyn πNonSyn TajDSyn TajDNonSyn FstSyn FstNonSyn
At1g03560 0.04037556 0.004951227 0.321201768 0.112944567 0.289714279 0.46611939
At1g59720 0.012501114 0.001675559 -2.21100974 -1.928604462 0.853546382 0.924234689
At1g74600 0 0 0 0 1 1
At2g28050 0 0 0 0 1 1
At2g36980 0.010187655 0.005203496 1.097275256 1.411195865 0.471024918 0.372726805
At3g62890 0.031165105 0.003379609 1.645761699 0.3476094 0.452441516 0.759263079
At4g14190 0.009422925 0.002747822 -1.335875919 -1.060051909 0.862810825 0.642941356
PPR average 0.01480748 0.002565387 -0.068949562 -0.159558077 0.704219703 0.737897903
permuted mean 0.006970173 0.000997386 0.270243122 0.070909296 0.596309547 0.721937385 a synonymous and nonsynonymous nucleotide diversity as measured by πSyn and πNonSyn where π is the average number of pairwise differences between two individuals b frequency of variants in the each of the PPR genes and genome averages in A. lyrata as measured by calculating Tajima’s D synonymous and nonsynonymous c synonymous and nonsynonymous pairwise population differentiation estimates as measured by the population differentiation parameter Fst calculated using πSyn and πNonSyn d statistically significant values are marked in bold
Dow
nloaded from https://academ
ic.oup.com/genetics/article/183/2/663/6063014 by guest on 01 M
arch 2022