Welington Tapuy D05 Welington_Tapuy_D05_Paralelo_AParalelo A
Genome-wide identification of PPR gene family and prediction … · sub-genome (chromosome D05)...
Transcript of Genome-wide identification of PPR gene family and prediction … · sub-genome (chromosome D05)...
1
Research Article
Genome-wide identification of PPR gene family and prediction analysis on
restorer gene in Gossypium
NAN ZHAO 1, YUMEI WANG 1 †, and JINPING HUA ∗
Laboratory of Cotton Genetics, Genomics and Breeding /Key Laboratory of Crop Heterosis and
Utilization of Ministry of Education /Beijing Key Laboratory of Crop Genetic Improvement, College of
Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China
† Research Institute of Cash Crops, Hubei Academy of Agricultural Sciences, Wuhan 430064, Hubei,
China
1 These authors contributed equally to this work.
Email:
Nan Zhao: [email protected]
Yumei Wang: [email protected]
Jinping Hua: [email protected]
∗For correspondence E-mail: [email protected]
Running title
PPR gene evolution in Gossypium species.
Keywords. Gossypium; PPR gene family; phylogenetic analysis; cytoplasmic male sterility;
restorer gene.
2
Abstract
PPR (pentatricopeptide repeat) gene family plays an essential role on the regulation of plant
growth and organelle gene expression. Some PPR genes are related to fertility restoration in
plant, but there is no detailed information in Gossypium. In present study, we identified 482
and 433 PPR homologs in G. raimondii (D5) and G. arboreum (A2) genomes. Most PPR
homologs showed an even distribution on the whole chromosomes. Given an evolutionary
analysis to PPR genes from G. raimondii (D5), G. arboreum (A2) and G. hirsutum genomes, 8
PPR genes were clustered together with restoring genes of other species. Most cotton PPR
genes were qualified with no intron, high proportion of α-helix and classical tertiary structure
of PPR protein. Based on bioinformatics analyses, 8 PPR genes were targeted in
mitochondrion, encoding typical P sub-family protein with protein binding activity and
organelle RNA metabolism in function. Further verified by RNA-seq and qRT-PCR analyses,
2 PPR candidate genes, Gorai.005G0470 (D5) and Cotton_A_08373 (A2), were up regulated
in fertile line than sterile line. These results reveal new insights into PPR gene evolution in
Gossypium.
3
Introduction
The cotton genus, Gossypium, is home to the most important fiber crop plants in the world,
with four of the ~53 species cultivated, two diploid and two allotetraploid. The genus originated
approximately 5-10 million years ago (Mya), subsequently diversifying into ~ 46 diploid species
(allocated into 8 monophyletic genome groups, designated A-G and K) and 7 allotetraploid species
(Wendel and Grover 2015; Chen et al. 2016; Chen et al. 2017c; Gallagher et al. 2017).
Allopolyploid Gossypium is the result of the transoceanic dispersal of an A-genome species
(resembling G. arboreum, A2), which subsequently hybridized with a native D-genome species
(resembling G. raimondii, D5) in the New World and experienced chromosome doubling (Wendel
1989; Chen et al. 2017a; Chen et al. 2017b).
Heterosis is widely exploited in crop plants to increase yield potential of production and
improve quality, including using three-lines (sterile line, maintainer line and restoring line) to
develop hybrid cotton (Bentolila et al. 2002). It is well known that fertility of plants is
co-determined by mitochondrial and nuclear genes (Dewey et al. 1987; Schnable and Wise
1998; Carlsson et al. 2008; Galtier 2011; Suzuki et al. 2013). Most nuclear restoring genes
were reported as homologs of PPR (Pentatricopeptide repeat) gene family, such as Rf-PPR592
in petunia (Bentolila et al. 2002; Koizuka et al. 2003; Gillman et al. 2007), Rfo in
CMS-Ogura radish (Brown et al. 2003; Desloire et al. 2003; Koizuka et al. 2003) and another
tightly linked restoring gene RsRf (Wang et al. 2013); similarly, PPR-like Rf genes were also
identified, such as Rf1 in CMS-BT rice (Kazama and Toriyama 2003; Akagi et al. 2004;
Komori et al. 2004; Wang et al. 2006), Rf5 in the CMS-HL rice (Hu et al. 2012), Rf3 in
CMS-S maize (Zabala et al. 1997), and PPR13 in A1 sorghum (candidate gene of Rf1) (Klein
et al. 2005). In addition, there exist restoring genes encoding non-PPR proteins, such as Rf2 in
CMS-T maize (Cui et al. 1996; Liu et al. 2001), Rf2 in CMS-LD rice (Itabashi et al. 2011),
Rf17 in CMS-CW rice (Fujii and Toriyama 2009), Rf1(bvORF20) in sugar beet (Matsuhira et
al. 2012).
PPR genes consist of a series of similar contiguous-arrangement PPR motifs with 35
degenerate amino acids, some of which are very conservative (Small and Peeters 2000), and
evolved from earlier TPR (tetratricopeptide repeat) ancestors (Barkan and Small 2014). PPR
genes are widespread in plants (Lurin et al. 2004; Wang et al. 2006), and PPR gene families
have had a significant influence on the plant organellar genome evolution, especially
organelle-specific RNA metabolism (Germain et al. 2013). PPR gene families are divided into
two subfamilies, the PLS and P subfamilies. The PLS subfamily itself is subdivided into four
groups: PLS group, group E, E+ group and DYW group (Lurin et al. 2004). Most PPR genes
contain no intron (Lurin et al. 2004) and encode organelle-targeting peptides in N-terminus
4
(Lurin et al. 2004). PPR gene functions are mainly focused on four aspects: 1) to regulate the
expression of chloroplast and mitochondrial genes, such as HCF152 in A. thaliana (Meierhoff
et al. 2003; Nakamura et al. 2003); 2) to participate in plant-specific RNA metabolism
(mainly PPR genes of PLS subfamily), such as CRR4 in A. thaliana (Hashimoto et al. 2003;
Howell et al. 2007); 3) to regulate the embryonic development of higher plants, such as CRR4
in A. thaliana (Cushing et al. 2005); and 4) to affect the fertility restoration of cytoplasmic
male sterility in plants, such as Rf1 in Oryza sativa (Wang et al. 2006). Compared with other
kinds of PPR genes, these PPR genes that serve as restoring genes usually cluster together
with some homologous sequences (also known as Rf-like or RFL genes), which leads to a
unique way of dynamic evolution (Geddy and Brown 2007; O'Toole et al. 2008; Fujii et al.
2011).
The CMS line of cotton with G. harknessii sterile cytoplasm (CMS-D2-2) is sporophyte
sterile, which is restored by a single dominant gene Rf1 (Zhang and Stewart 2004). Rf1 was
located on chromosome 19 (Li et al. 2007), namely, LGD08 linkage group in Dt sub-genome
(chromosome D05) (Wang et al. 2009). The latest association between the chromosomes of
allotetraploid cotton and that of diploid G. raimondii pointed out that chromosome 19
(chromosome D05) of allotetraploid cotton corresponding to chromosome 9 of G. raimondii
(Zhao et al. 2012; Zhang et al. 2014). Then, a Rf1 candidate gene, Cotton_D_gene_10013437,
showed 9nt insertion in 3’ UTR and a SNP in restoring line compared to non-restoring lines
(Wu et al. 2014). Up to now, the restorer genes for cytoplasmic male sterility in plants are
mainly obtained through map-based cloning, and some progresses have been made in
screening molecular markers associated with cotton restoring genes and mapping. With the
high-throughput biological data springing up, it may turn out to be a feasible method to
explore the fertility restorer genes of cotton cytoplasmic male sterility (CMS) by whole
genome and transcriptome sequencing combined with bioinformatics analysis. Taken the
close relationship to Rf genes in other species, PPR gene families were identified in G.
arboreum (A2) and G. raimondii (D5) genomes. From an evolutionary perspective, we further
obtain some candidate cotton PPR genes that cluster with Rf-PPR genes in other species. In
addition, we analyzed the evolutionary pressure, functional annotation, subfamily
classification and subcellular localization of these PPR genes. Last, the differential expression
of PPR candidate genes was analyzed in the sterile and fertile cotton materials using RNA-seq
transcriptome data and qRT-PCR validation. We expect the results will lay the foundation for
further researches on the molecular mechanisms of interaction between restorer gene and
CMS-D2 cytoplasm in cotton.
Materials and methods
5
Plant materials
This experiment used G. harknessii CMS line 2074A, G. hirsutum CMS line 2074S, their
maintainer 2074B, and two different fertile F1 hybrids derived from both CMS lines with
restorer line E5903 as plant materials (Li et al. 2013).
2074A, a CMS line with G. harknessii (D2-2) male-sterile cytoplasm, was bred by
backcrossing the original sterile line DES-HAMS277 (Meyer 1975) more than 20 generations
with upland cotton cultivar ‘2074B’ (Lei et al. 2013).
2074S, a CMS line with G. hirsutum (AD1) male-sterile cytoplasm, was derived from
hybridizing line X658 with G. hirsutum L. (AD1) and backcrossing for 17 generations.
2074B, the maintainer of 2074A and 2074S with G. hirsutum fertile cytoplasm, a cultivar
of upland cotton ‘Sumian 20’.
E5903, a nuclear restorer line with normal nuclear and normal fertile G. harknessii
cytoplasm, originated from DES-HAF277 (Meyer 1975) by inbreeding for more than 30
generations.
F1, fertile F1 generations materials were generated from hybridizing CMS line 2074A and
2074S with restorer line E5903 (FA, 2074A × E5903; FS, 2074S × E5903).
Identification of PPR gene family and chromosome localization analysis
The genome sequences, CDS sequences and amino acid sequences of G. raimondii, G.
arboreum and G. hirsutum were downloaded from Phytozome (http://www.phytozome.net/)
and Cotton Genome Project (CGP) (http://cgp.genomics.org.cn/), respectively. To identify
members of the PPR protein family in the genome assembly of Gossypium, all available PPR
domain sequences from the Pfam database (http://pfam.xfam.org) were collected and used for
the development of a Hidden Markov Model (HMM) profile matrix using the hmmbuild
program of the HMMER package (v3.1b1, http://hmmer.org). This HMM profile matrix was
used to identify members of the PPR family in cotton amino acid sequences obtained from
these high-quality genomic drafts of the G. raimondii, G. arboreum and G. hirsutum genome
sequences (Paterson et al., 2012; Li et al., 2014; Zhang et al., 2015). Sequences containing 10
or more P-class PPR motifs were retained for further analyses, as a previous study has shown
that RFL genes are primarily comprised of tandem arrays of 15 to 20 PPR motifs (Fujii et al.
2011). The location of PPR genes on chromosomes were determined by local BLAST
(Altschul et al. 1990).
Phylogenetic analysis
6
The amino acid sequences of 6 Rf-PPR genes from 5 plant species (rapeseed
(PPR_B_L1), radish (Rfo_PPR B), Arabidopsis (RPF1), petunia (Rf_PPR592), rice (Rf1a and
Rf1b)) were downloaded from NCBI database. We separately performed 26
single-chromosome phylogenetic analyses of PPR protein genes in G. raimondii (D5) and G.
arboreum (A2) genomes with 6 Rf-PPR genes mentioned above using amino acid sequences.
The resulting PPR genes on each chromosome of D5 and A2 that clustered with those 6
confirmed Rf-PPR genes were subsequently used to conduct a comprehensive analysis with 6
Rf-PPR genes as well as 15 G. hirsutum (AD1) PPR protein sequences retrieved from the
NCBI database. As an important supplement, PPR genes predicted on chromosome D05 in G.
hirsutum (AD1) genome (data unpublished) were also used for a single-chromosome
phylogenetic analysis. First, amino acid sequences were aligned by MAFFT (Katoh and
Standley 2013), and setting the default parameters. Then, the phylogenetic trees were built
based on GTR + G +R model by Maximum Likelihood method using MEGA 5.05 (Tamura
et al. 2011), setting the bootstrap value to 1000 repeats.
Selective constraints analysis
Homologous PPR genes pairs in G. raimondii (D5) and G. arboreum (A2) genomes were
acquired using BLAST alignment with the highest identity. The alignment fasta files were
converted to PAML files using software DAMBE (Xia and Xie 2001). Non-synonymous
substitution rate (Jore et al. 2011), synonymous substitution rate (dS) and the value of dN/dS
were calculated using yn00 program in PAML (Yang and Nielsen 2000). GO annotations were
carried out by agriGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php). Taking cotton genome
locus (phytozome) as reference, we conducted function annotation to corresponding PPR
genes in G. raimondii (D5) genome by Singular Enrichment Analysis (SEA) and adopted
hypergeometric statistical test method and Yekutieli (FDR under dependency) multi-test
adjustment method. Significant level was set to 0.05.
GO annotation and subcellular localization analysis
GO (Gene Ontology) annotation of PPR genes were finished by blast2go. First, the CDS
sequences of PPR candidate genes were aligned with nr database, and then annotated. Cut-off
of E value was set to 1e-6. The subcellular localization were predicted using TargetP 1.1
Server (http://www.cbs.dtu.dk/services/TargetP/), Predotar
(https://urgi.versailles.inra.fr/predotar/predotar.html) and ProtComp v. 9.0
(http://linux1.softberry.com/berry.phtml).
Subfamily analysis
7
PPR domains analysis of PPR genes were developed based on HMMER matrix (defined on 7
conserved domains of PPR gene family in Arabidopsis: P, L, L2, S, E, E+ and DYW) using
hidden markov model in software HMMER3.0 (Mistry et al. 2013). Subsequently, each PPR
sequence was analyzed artificially for its arrangement of PPR motifs. E value was set to less
than e-10 in hmmsearch.
RNA-seq and qRT-PCR analyses
RNA-seq data of young buds in CMS line 2074A, maintainer line 2074B and fertile material
F1 (2074A × E5903 (restoring line)) (unpublished) were used to analyze the expression of
PPR genes. The expression were estimated using RPKM (reads per kilobase of exon model
per million mapped reads) values. The diagram was drawn through gplots package in R.
Total RNA of young buds were extracted using improved CTAB-SDS method in 6 cotton
species: the CMS lines 2074A and 2074S, their maintainer line 2074B, restorer line E5903,
fertile hybrid material F1 (FA, 2074A × E5903; FS, 2074S × E5903). Genomic DNA digestion
and reverse transcription were carried on using PrimeScript ™ RT reagent Kit with gDNA
Eraser (Perfect Real Time) RR047A (TaKaRa). The primers used for qRT-PCR were designed
by Primer Premier 5, and synthesized by Sangon Biotech (Additional file 5: Table S5). Real
time PCR experiments were finished using SYBR® Premix Ex TaqTM II (Tli RNaseH Plus)
RR820A kit (TaKaRa) by Applied Biosystems 7500 Real-Time PCR System. The procedure
contained 3 stages: stage 1, 95℃, 30 sec, 1 repeats; stage 2: 95℃, 5 sec, 60℃, 35 sec, 40
repeats; stage 3: 95℃, 15 sec, 60℃, 1 min, 95℃, 35 sec, 1 repeats. Taking cotton
housekeeping genes UBQ7 as internal control, we analyzed the relative expression of 8 PPR
candidate genes using 2-ΔΔCt method. Each sample is repeated for 3 times.
Results and Discussion
Identification and chromosome distribution of PPR gene family
Totally 482 and 433 PPR genes from G. raimondii (D5) and G. arboreum (A2) were identified
by genome-wide analyses (table 1). The distribution of PPR genes varied among 13
chromosomes in G. raimondii (D5) and G. arboreum (A2) genomes, respectively. The
maximum numbers of PPR genes were 75 and 49 on a single chromosome of G. raimondii (D5)
and G. arboreum (A2), and located on chromosome 9 and chromosome 6, respectively (figure
1). While the chromosome 3 and chromosome 2 contained the least PPR genes in G.
raimondii (D5) and G. arboreum (A2) genomes, respectively, which were 20 and 18 (figure 1).
PPR genes in two cotton species were evenly distributed, which had been observed in
8
Arabidopsis 5 chromosomes (Aubourg et al. 2000; Lurin et al. 2004). However, some PPR
gene clustered on some chromosomes, such as chromosome 4, 5, 6 and 10 in G. raimondii (D5)
genome, as well as chromosome 4 and 5 in G. arboreum (A2) genome. These clustered PPR
genes are typically involved in the Rf loci as had been observed in other plants (Bentolila et
al. 2002; Brown et al. 2003; Giancola et al. 2003; Komori et al. 2004; Wang et al. 2006).
Phylogenetic analyses
Restorer of fertility-like (RFL) PPR genes have been reported in several plant species, such as
Rf-PPR592 in petunia, Rfo in radish, RPF1 in Arabidopsis, Rf1a and Rf1b in rice (Bentolila et
al. 2002; Brown et al. 2003; Giancola et al. 2003; Komori et al. 2004; Wang et al. 2006).
Taking those Rfs as outgroups, we performed 26 single-chromosome phylogenetic analyses of
PPR genes in G. raimondii (D5) and G. arboreum (A2) genomes separately (Additional file 1:
figure S1). In total, we acquired 36 and 19 candidate restorer of fertility-like (RFL)PPR genes
clustering together with other 6 Rfs in G. raimondii (D5) and G. arboreum (A2) genomes,
respectively (Additional file 2: table S1 and table S2).
Furthermore, a comprehensive phylogenetic analysis consisted of 36 PPR candidate
genes from G. raimondii (D5), 19 PPR candidate genes from G. arboreum (A2) and 15 PPR
genes from G. hirsutum (AD1) (data unpublished), with 6 Rfs genes as outgroups (figure 2).
There were 8 PPR genes derived from G. raimondii (D5), G. arboreum (A2) and G. hirsutum
(AD1) genomes clustering into one clade. Thereinto, two homologous pairs, Gorai.005G0470
(D5) and Cotton_A_08373 (A2), Gorai.007G1431 (D5) and Cotton_A_26557 (A2), along with
Gorai.006G2471 (D5) had a close evolutionary relationships with Rf_PPR592 in petunia.
Gorai.010G0536 (D5) and GhK14 (AD1) were sister to PPRB_L1 of rapeseed, Rfo_PPRB of
radish and RPF1 of Arabidopsis. GhPPR3 (AD1) clustered with Rf1a and Rf1b in rice (figure
2). These 8 PPR candidate genes might be associated with the fertility restoration in cotton, as
studies had shown that Rfs and highly homologous RFL genes in plant species always formed
a single evolutionary clade (Fujii et al. 2011; Melonek et al. 2016; Sykes et al. 2017).
Fertility of cytoplasmic male sterility (CMS) line in G. harknessii (D2-2) was restored by
Rf1 (Feng et al. 2005), which was mapped on chromosome D05 (Liu et al. 2003; Li et al.
2007; Wang et al. 2009). Molecular markers tightly linked to Rf1 include UBC679-700 and
BNL4047-170 (Yin et al. 2006), CIR179-200 and CM042-150 (Li et al. 2007), Y1107-350 and
TRAP425 (Wang et al. 2009), while no any alignment with these markers to G. hirsutum
anchored chromosomes (Zhang et al. 2015). In addition, we predicted 55 PPR genes on
chromosome D05 in G. hirsutum (AD1) genome (data unpublished). After phylogenetic
analysis of 55 PPR genes, we acquired 3 PPR candidate genes that clustered to 6 restorer
genes (Additional file 1: figure S2). These results indicated that these PPR candidate genes
9
might have a more close relationship with 6 Rfs in 5 other plant species.
Selective constraints on PPR genes
G. raimondii and G. arboreum diverged from a common ancestor about 10 million years ago,
and were almost similar in gene number and sequence (Li et al. 2014). We found 377 pairs of
homologous PPR genes (Additional file 3: table S3) between two genomes, that is, 78% PPR
genes in G. raimondii (D5) genome were homologous to 87% of PPR genes in G. arboreum
(A2) genome, suggesting that most PPR genes in two genomes were co-evolved.
In order to study the evolution pattern of PPR gene families in cotton, we calculated the
nucleotide nonsynonymous substitution rate (dN), nucleotide synonymous substitution rate
(dS) and the dN/dS value (Jore et al. 2011). As we observed, most PPR genes were under
purifying selection (figure 3A, Additional file 3: table S3). Interestingly, average dN and dS
values of RFLs (36 D5-RFLs and 19 A2-RFLs) genes were higher than other PPR genes, as
also reported in Fujii et al. (2011). The D5-RFLs evolved faster than other PPR genes, on the
contrary, A2-RFLs had a lower evolutionary rate than other PPR genes (figure 3C, Additional
file 3: table S3). It’s likely that the restorer gene might derived from D sub-genome (Wu et al.
2014), especially for those cotton lines with D genome sterile cytoplasm, such as 2074A
containing G. hirsutum nuclear and G. harknessii sterile cytoplasm in our study, resulting in a
specific nuclear-cytoplasmic interaction combination. Maybe it is a much more complex
question than the difference in polyploid or diploid cotton, because most cotton CMS lines
were created by hybridizing between different species.
In addition, in order to clarify the relationship between the evolution pattern of PPR
genes and biological functions involved in cotton, we conducted GO annotation to A2-D5
homologous PPR genes (Additional file 4: figure S3), and categorized by dN, dS and dN/dS
value. We detected that PPR genes related to localization contain the lowest dN/dS value
(figure 3B, Additional file 3: table S3), which suggested that this kind of PPR genes suffered
evolutionary constraint during the divergence process of G. raimondii (D5) and G. arboreum
(A2). Most PPR genes were targeted in mitochondria and a few in chloroplasts, which
correspond to the organelles-targeting peptide sequence in the N end of most of PPR genes
(Lurin et al. 2004).
Subcellular localization and Subfamily analysis of PPR candidate genes
For the 36 D5-RFLs, 19 A2-RFLs, 15 AD1 PPR genes in cotton and 6 Rf genes in other species,
most of them were targeted in mitochondria, a few in chloroplasts.These results were verified
by subcellular localization from three softwares (TargetP, Predotar and ProtComp). That is,
10
72% of PPR genes were in mitochondria, 10% in chloroplasts, and 16% overlapped (figure 4),
as observed that most Rf-PPR genes were targeted to mitochondria (Bentolila et al. 2002;
Komori et al. 2004; Lurin et al. 2004).
PPR gene family was divided into PLS subfamily and P subfamily, while PLS subfamily
was further subdivided into four groups: PLS group, E group, E+ group and DYW group
(Lurin et al. 2004). In our research, we analyzed PPR motif arrangement of 36 D5-RFLs, 19
A2-RFLs, 15 G. hirsutum (AD1) PPR genes in cotton and 6 Rf genes in other species using
HMMER matrix (defined by 7 conservative domains: P, L, L2, S, E, E+ and DYW) of PPR
gene family in Arabidopsis thaliana. 6 Rf -PPR genes belonged to the P subfamily (Bentolila
et al. 2002), 36 D5-RFLs and 19 A2-RFLs genes were also attached to P subfamily. However,
15 G. hirsutum (AD1) PPR genes covered all kinds of PPR gene family groups, in which a
variety of classical PPR domains were lined up in a particular order (Lurin et al. 2004)
(Additional file 5: table S4).
RNA-seq and qRT-PCR analyses of PPR candidate gene expressions
In order to verify whether these PPR candidate genes are associated with fertility restoration
in cotton, we performed expression analysis of 36 D5-RFLs, 19 A2-RFLs, 15 AD1 PPR genes
based on RNA-seq data of young buds in CMS line 2074A, maintainer line 2074B and fertile
material FA (unpublished). Compared with the maintainer line 2074B containing normal
fertile cytoplasm from G. hirsutum, the CMS line 2074A and the fertile material FA have the
same male sterile cytoplasm from G. harknessii. However, when hybridizing with the restorer
line E5903 that has normal fertile nuclear and cytoplasm from G. harknessii, the sterile line
2074A produced the fertile FA due to the recombination of a dominant gene Rf with original
recessive non-functional allele rf. All three cotton lines almost have the isogenic nuclear
genomes comprised of A sub-genome and D sub-genome, i.e. they may have different alleles
and/or differential expression of the same restorer gene. In our study, we found that most of
these PPR candidate genes were highly expressed in FA, while lowly expressed in maintainer
line and sterile line (figure 5). Furthermore, 8 of these PPR candidate genes were up-regulated
in FA than in sterile line, which confirmed that these candidate genes are likely related to
fertility restoration in cotton (table 2). Some restorer genes could reduce the abundance of
CMS-related transcripts at transcriptional or post-transcriptional levels, such as Rf-PPR592 in
CMS-RM petunia (Bentolila et al. 2002). In addition, there are also some restorer genes that
function at the genetic or protein levels, such as Fr in CMS-Sprite bean (Mackenzie and
Chase 1990; Janska et al. 1998) and Rf3 in CMS-WA rice (Luo et al. 2013), thus further
experiments are still needed to reveal the molecular mechanism of fertility restoration.
Furthermore, to validate the RNA-seq expression data by experiments, we then
11
carried on qRT-PCR to analyze the differential expression of PPR candidate genes in CMS
lines 2074A and 2074S, their maintainer line 2074B, restorer line E5903 and fertile hybrid
material F1s (FA, 2074A×E5903; FS, 2074S×E5903). Taking cotton housekeeping genes
UBQ7 as internal control, we analyzed the relative expression of 8 PPR candidate genes in
young buds of 6 cottons through real-time fluorescent quantitative PCR technology. As a
result, we found that the expression of two PPR candidate genes, Gorai.005g0470 (D5) and
Cotton_A_08373 (A2), were higher in FA than in sterile line 2074A, while were similar in
expression pattern in 6 cottons (figure 6). At the same time, the up-regulated times of these
two genes in FA than in sterile lines 2074A were 3.45 and 12.59 by RNA-seq, respectively. In
addition, these two PPR genes share high homology, which indicates that their common
ancestor gene appeared before the divergence of D5 and A2 genomes. During the process of
subsequent evolution, they were under purifying selection (Additional file 3: table S3).
Through the phylogenetic analyses, we found that they had a close evolution relationship to
the restorer gene Rf_PPR592 in petunia (Bentolila et al. 2002). In this study, we turned the
progeny of the sterile line 2074A into the fertile FA by the possible Rf gene from D2 nuclear
genome. Therefore, Gorai.005g0470 derived from D5 is more likely to be the candidate Rf
gene of G. harknessii CMS lines 2074A than Cotton_A_08373 in A2. We hope that our results
might provide some helps for studying the restorer genes in cotton.
Conclusion
Totally 482 and 433 PPR genes in two diploid cotton species, G. raimondii (D5) and G.
arboreum (A2) were identified in this study. They were evenly distributed over chromosomes
with few clustered. Phylogenetic analyses produced 36 D5-RFLs and 19 A2-RFLs, thereinto,
D5-RFLs evolved faster than other PPR genes. These RFLs accompanied by 15 AD1-PPR
genes were further brought into a comprehensive phylogenetic analysis, which resulted in 8
cotton PPR candidate genes clustering together with 6 Rf genes in other plant species. 2 of
PPR candidate genes, Gorai.005g0470 (D5) and Cotton_A_08373 (A2) were confirmed to be
up-regulated in fertile lines than in sterile line in cotton by RNA-seq and qRT-PCR analyses.
Our study provided preliminary insights into PPR genes evolution and the RFL genes in
cotton.
12
Figure legends
Figure 1. Distribution of PPR genes number over chromosomes in G. raimondii (D5) and G.
arboreum (A2) genomes. The number of PPR genes on 13 chromosomes in G. raimondii (D5)
was denoted in the sign of ”●”, while that in G. arboreum (A2) was marked in the sign
of ”+”. Except for PPR genes on 13 chromosomes, there were also few PPR genes whose
chromosome location had not been identified, namely, “others”.
Figure 2. Comprehensive phylogenetic analyses of PPR genes from G. raimondii, G.
arboreum and G. hirsutum L. by Maximum Likelihood method. According to the species, the
genes were illustrated in different shapes, box: G. raimondii, dot: G. arboreum, diamond: G.
hirsutum, Outgroups are six restorer genes from five different species, Petunia x hybrid,
Oryza sativa ssp. indica, B. napus, R. sativus and A. thaliana. They were marked by triangle
and the corresponding branches are in bold. These genes keeping a close evolution
13
relationship with other restorer genes are marked in solid shapes.
Figure 3. Nucleotide substitution rates of homologous PPR genes in G. raimondii (D5) and G.
arboreum (A2) genomes. (a) Density distribution of dN/dS values of PPR homologous genes
between G. raimondii (D5) and G. arboreum (A2) genomes. (b) Average nucleotide
substitution rates of RFLs and other PPR genes in G. raimondii (D5) and G. arboreum (A2)
genomes. (c) Box plot for the distribution of dN/dS values of D5-A2 PPR homologies on
secondary level GO terms.
Figure 4. Sub-cellular localization of PPR genes in G. raimondii (D5), G. arboreum (A2) and
G. hirsutum (AD1) genomes. TP, PD and PC represented three softwares, TargetP, Predotar
and ProtComp, separately. The dark blue denoted mitochondria, the light blue chloroplasts,
the white unsure.
14
Figure 5. Expression analysis of PPR candidate genes in G. raimondii (D5), G. arboreum (A2)
and G. hirsutum (AD1) genomes. Based on RNA-seq data of sterile line 2074A, maintainer
line 2074B and fertile material AE1 (F1 [2074A × E5903]), the expression of PPR candidate
genes was calculated by the method of RPKM. The gene expression was denoted by different
colors, green represented relatively down-regulated, and red meant relatively up-regulated.
Two PPR candidate genes in G. arboreum (A2) genome and four PPR candidate genes in G.
raimondii (D5) genome were marked by red arrows on the right and were relatively
up-regulated in AE1 [PPR-21 (Gorai.010G053600.1) and PPR-22 (Gorai.010G053600.2)
were two different transcripts of the same gene (Gorai.010G0536), so there were seven red
arrows]). Two green arrows marked down-regulated genes in AE1.
Figure 6. Relative expression analysis of 8 PPR candidate gene s in buds of 6 different
fertility cotton species. The expression in bud of 2074A was considered as the control, and
UBQ7 was used as reference gene, and the control. The value is calculated with the method of
2-ΔΔCt.
15
Additional files
Additional file 1: Figure S1. Single-chromosome phylogenetic analyses of PPR genes in G.
raimondii (D5) genome and G. arboreum (A2) genome by Maximum Likelihood method.
Figure S2. Phylogenetic analysis on PPR genes on chromosome D05 in G. hirsutum (AD1)
genome. Box: G. raimondii; dot: G. arboretum; triangle: outgroups (Petunia x hybrid, Oryza
sativa ssp. indica, B. napus, R. sativus and A. thaliana); solid: candidate PPR genes.
Additional file 2: Table S1. Information of PPR candidate genes derived from 13
chromosomes of G. raimondii (D5) genome. Table S2. Information of PPR candidate genes
derived from 13 chromosomes of G. arboreum (A2) genome.
Table S1 Information of PPR candidate genes derived from 13 chromosomes of G. raimondii (D5) genome
Chromosome No. of gene Gene No. of sequence Sequence
chr01 1 Gorai.001G1316 1 Gorai.001G131600.1
chr02 2 Gorai.002G0718 3 Gorai.002G071800.1,
Gorai.002G071800.2
Gorai.002G1010 Gorai.002G101000.1
chr03 1 Gorai.003G1716 1 Gorai.003G171600.1
chr04 3 Gorai.004G2907 3 Gorai.004G290700.1
Gorai.004G2406 Gorai.004G240600.1
Gorai.004G2438 Gorai.004G243800.1
chr05 1 Gorai.005G0470 1 Gorai.005G047000.1
chr06 2 Gorai.006G2252 2 Gorai.006G225200.1
16
Gorai.006G2471 Gorai.006G247100.1
chr07 1 Gorai.007G1431 1 Gorai.007G143100.1
chr08 1 Gorai.008G0443 1 Gorai.008G044300.1
chr09 4 Gorai.009G3762 4 Gorai.009G376200.1
Gorai.009G2580 Gorai.009G258000.1
Gorai.009G0058 Gorai.009G005800.1
Gorai.009G1519 Gorai.009G151900.1
chr10 4 Gorai.010G2281 9 Gorai.010G228100.1,
Gorai.010G228100.2
Gorai.010G0536
Gorai.010G053600.1,
Gorai.010G053600.2
Gorai.010G0325
Gorai.010G032500.1,
Gorai.010G032500.2,
Gorai.010G032500.3
Gorai.010G0722
Gorai.010G072200.1,
Gorai.010G072200.2
chr11 10 Gorai.011G1557 11 Gorai.011G155700.1
Gorai.011G1515 Gorai.011G151500.1
Gorai.011G1514 Gorai.011G151400.1
Gorai.011G1511 Gorai.011G151100.1
Gorai.011G1512 Gorai.011G151200.1
Gorai.011G1451 Gorai.011G145100.1
Gorai.011G1464
Gorai.011G146400.1,
Gorai.011G146400.2
Gorai.011G1466 Gorai.011G146600.1
Gorai.011G1450 Gorai.011G145000.1
Gorai.011G1465 Gorai.011G146500.1
chr12 4 Gorai.012G1593 9 Gorai.012G159300.1,
Gorai.012G159300.2,
Gorai.012G159300.3
Gorai.012G1205
Gorai.012G120500.1,
Gorai.012G120500.2
Gorai.012G0303
Gorai.012G030300.1,
Gorai.012G030300.2
Gorai.012G1494
Gorai.012G149400.1,
Gorai.012G149400.2
chr13 2 Gorai.013G0606 4 Gorai.013G060600.1,
Gorai.013G060600.2
Gorai.013G0109
Gorai.013G010900.1,
Gorai.013G010900.2
Total No. 36 50
Table S2 Information of PPR candidate genes derived from 13 chromosomes of G. arboreum (A2) genome
Chromosome No. of gene Gene No. of sequence Sequence
chr01 1 Cotton_A_32157 1 Cotton_A_32157
chr02 3
Cotton_A_37656
3
Cotton_A_37656
Cotton_A_28832 Cotton_A_28832
Cotton_A_00514 Cotton_A_00514
chr03 2 Cotton_A_16847
2 Cotton_A_16847
Cotton_A_18522 Cotton_A_18522
chr04 2 Cotton_A_26557 2 Cotton_A_26557
17
Cotton_A_03817 Cotton_A_03817
chr05 1 Cotton_A_08373 1 Cotton_A_08373
chr06 1 Cotton_A_27681 1 Cotton_A_27681
chr07 1 Cotton_A_06850 1 Cotton_A_06850
chr08 0 -- 0 --
chr09 1 Cotton_A_02931 1 Cotton_A_02931
chr10 2 Cotton_A_04606
2 Cotton_A_04606
Cotton_A_29300 Cotton_A_29300
chr11 3
Cotton_A_13069
3
Cotton_A_13069
Cotton_A_17619 Cotton_A_17619
Cotton_A_14743 Cotton_A_14743
chr12 0 -- 0 --
chr13 1 Cotton_A_26837 1 Cotton_A_26837
others 1 Cotton_A_37173 1 Cotton_A_37173
Total No. 19 19
Additional file 3: Table S3. Nucleotide substitution rates of PPR homologous genes between
G. raimondii (D5) and G. arboreum (A2) genomes.
Table S3. Nucleotide substitution rates of PPR homologous genes between G. raimondii (D5) and G. arboreum (A2)
genomes
Numbe
r gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
1c 2 (Cotton_A_11317) vs. 1 (Gorai.009G1666) 0.017 0.009 1.980c
2 2 (Cotton_A_28224) vs. 1 (Gorai.013G0655) 0.016 0.014 1.140
3 2 (Cotton_A_09575) vs. 1 (Gorai.001G1210) 0.014 0.014 1.051
4 2 (Cotton_A_34843) vs. 1 (Gorai.012G1341) 0.015 0.014 1.042
5 2 (Cotton_A_31264) vs. 1 (Gorai.004G1601) 0.042 0.042 1.015
6 2 (Cotton_A_11768) vs. 1 (Gorai.012G1877) 0.022 0.022 1.013
7b 2 (Cotton_A_06304) vs. 1 (Gorai.002G0718) 0.014 0.014 0.999
8b 2 (Cotton_A_19072) vs. 1 (Gorai.012G0303) 0.035 0.037 0.957
9 2 (Cotton_A_01860) vs. 1 (Gorai.006G1315) 0.023 0.025 0.939
10 2 (Cotton_A_32325) vs. 1 (Gorai.001G2487) 0.019 0.021 0.935
11 2 (Cotton_A_27929) vs. 1 (Gorai.013G1240) 0.018 0.020 0.928
12 2 (Cotton_A_18719) vs. 1 (Gorai.001G2147) 0.046 0.050 0.917
18
13b 2 (Cotton_A_23084) vs. 1 (Gorai.011G1451) 0.015 0.018 0.851
14 2 (Cotton_A_02450) vs. 1 (Gorai.006G1772) 0.015 0.017 0.848
15 2 (Cotton_A_39828) vs. 1 (Gorai.001G2016) 0.016 0.019 0.832
16 2 (Cotton_A_20263) vs. 1 (Gorai.012G0213) 0.062 0.075 0.831
17 2 (Cotton_A_17523) vs. 1 (Gorai.011G0381) 0.013 0.015 0.825
18 2 (Cotton_A_35996) vs. 1 (Gorai.004G0976) 0.046 0.058 0.802
19 2 (Cotton_A_10814) vs. 1 (Gorai.008G0282) 0.120 0.152 0.792
20 2 (Cotton_A_11187) vs. 1 (Gorai.003G0412) 0.025 0.032 0.769
21 2 (Cotton_A_27680) vs. 1 (Gorai.008G1927) 0.018 0.024 0.763
22b 2 (Cotton_A_24724) vs. 1 (Gorai.006G2471) 0.027 0.036 0.748
23 2 (Cotton_A_24061) vs. 1 (Gorai.006G1651) 0.016 0.021 0.747
24 2 (Cotton_A_22811) vs. 1 (Gorai.005G1522) 0.019 0.026 0.745
25 2 (Cotton_A_40801) vs. 1 (Gorai.010G1366) 0.021 0.029 0.725
26 2 (Cotton_A_05635) vs. 1 (Gorai.001G0212) 0.018 0.025 0.718
27b 2 (Cotton_A_30591) vs. 1 (Gorai.003G1716) 0.032 0.046 0.710
28 2 (Cotton_A_03316) vs. 1 (Gorai.004G0658) 0.023 0.032 0.708
29 2 (Cotton_A_01798) vs. 1 (Gorai.006G1265) 0.013 0.019 0.699
30 2 (Cotton_A_26211) vs. 1 (Gorai.003G0971) 0.073 0.105 0.691
31 2 (Cotton_A_26989) vs. 1 (Gorai.004G0714) 0.012 0.017 0.684
32 2 (Cotton_A_33958) vs. 1 (Gorai.003G0875) 0.011 0.017 0.683
33 2 (Cotton_A_32710) vs. 1 (Gorai.008G2494) 0.015 0.022 0.674
34a 2 (Cotton_A_04606) vs. 1 (Gorai.009G1866) 0.016 0.023 0.672
35 2 (Cotton_A_22116) vs. 1 (Gorai.002G2139) 0.021 0.032 0.667
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
36 2 (Cotton_A_17776) vs. 1 (Gorai.008G2735) 0.019 0.029 0.666
37 2 (Cotton_A_17484) vs. 1 (Gorai.011G0686) 0.015 0.023 0.663
38 2 (Cotton_A_36527) vs. 1 (Gorai.013G1385) 0.019 0.029 0.660
39 2 (Cotton_A_02090) vs. 1 (Gorai.008G0415) 0.021 0.031 0.659
40 2 (Cotton_A_13788) vs. 1 (Gorai.010G1922) 0.014 0.021 0.656
41 2 (Cotton_A_10685) vs. 1 (Gorai.007G3051) 0.013 0.020 0.630
42 2 (Cotton_A_37988) vs. 1 (Gorai.001G2469) 0.014 0.022 0.629
43 2 (Cotton_A_33828) vs. 1 (Gorai.013G0985) 0.014 0.022 0.625
44 2 (Cotton_A_22551) vs. 1 (Gorai.004G0846) 0.015 0.024 0.614
45 2 (Cotton_A_06956) vs. 1 (Gorai.009G0970) 0.052 0.085 0.613
46 2 (Cotton_A_34686) vs. 1 (Gorai.001G1887) 0.016 0.026 0.611
47 2 (Cotton_A_23085) vs. 1 (Gorai.011G1450) 0.064 0.105 0.607
19
48 2 (Cotton_A_06080) vs. 1 (Gorai.012G0077) 0.077 0.127 0.607
49 2 (Cotton_A_23145) vs. 1 (Gorai.013G1741) 0.021 0.034 0.601
50 2 (Cotton_A_32339) vs. 1 (Gorai.007G0962) 0.016 0.027 0.598
51ab 2 (Cotton_A_26557) vs. 1 (Gorai.007G1431) 0.064 0.107 0.597
52 2 (Cotton_A_20633) vs. 1 (Gorai.007G0758) 0.032 0.055 0.590
53 2 (Cotton_A_18215) vs. 1 (Gorai.005G1471) 0.014 0.024 0.588
54 2 (Cotton_A_07425) vs. 1 (Gorai.009G4553) 0.017 0.029 0.588
55 2 (Cotton_A_16281) vs. 1 (Gorai.010G0532) 0.014 0.023 0.586
56 2 (Cotton_A_36956) vs. 1 (Gorai.012G0712) 0.012 0.021 0.585
57 2 (Cotton_A_16549) vs. 1 (Gorai.006G0786) 0.019 0.033 0.582
58 2 (Cotton_A_28680) vs. 1 (Gorai.002G1322) 0.016 0.027 0.579
59 2 (Cotton_A_30727) vs. 1 (Gorai.011G1778) 0.018 0.031 0.578
60 2 (Cotton_A_26765) vs. 1 (Gorai.001G1106) 0.011 0.020 0.575
61 2 (Cotton_A_24224) vs. 1 (Gorai.013G1917) 0.017 0.030 0.574
62 2 (Cotton_A_28020) vs. 1 (Gorai.001G0406) 0.015 0.025 0.572
63 2 (Cotton_A_04063) vs. 1 (Gorai.009G2942) 0.016 0.029 0.567
64 2 (Cotton_A_07524) vs. 1 (Gorai.011G0889) 0.012 0.021 0.565
65 2 (Cotton_A_10449) vs. 1 (Gorai.011G2299) 0.082 0.148 0.555
66 2 (Cotton_A_36721) vs. 1 (Gorai.002G0574) 0.079 0.144 0.553
67 2 (Cotton_A_30330) vs. 1 (Gorai.010G2078) 0.015 0.028 0.553
68 2 (Cotton_A_06176) vs. 1 (Gorai.008G1784) 0.012 0.022 0.551
69 2 (Cotton_A_20922) vs. 1 (Gorai.008G1195) 0.015 0.027 0.549
70 2 (Cotton_A_01973) vs. 1 (Gorai.007G0468) 0.015 0.028 0.549
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
71 2 (Cotton_A_33269) vs. 1 (Gorai.003G1164) 0.024 0.043 0.549
72 2 (Cotton_A_24707) vs. 1 (Gorai.011G2260) 0.016 0.029 0.543
73 2 (Cotton_A_07464) vs. 1 (Gorai.009G4512) 0.015 0.028 0.542
74 2 (Cotton_A_13263) vs. 1 (Gorai.001G1344) 0.015 0.028 0.539
75 2 (Cotton_A_25685) vs. 1 (Gorai.005G1720) 0.016 0.029 0.538
76 2 (Cotton_A_09296) vs. 1 (Gorai.007G2760) 0.020 0.038 0.536
77 2 (Cotton_A_40007) vs. 1 (Gorai.013G0825) 0.013 0.024 0.533
78 2 (Cotton_A_20041) vs. 1 (Gorai.013G2630) 0.019 0.036 0.532
79 2 (Cotton_A_13147) vs. 1 (Gorai.004G1887) 0.019 0.035 0.528
80 2 (Cotton_A_29248) vs. 1 (Gorai.010G2243) 0.015 0.029 0.526
81 2 (Cotton_A_34976) vs. 1 (Gorai.011G1657) 0.027 0.051 0.523
82 2 (Cotton_A_35366) vs. 1 (Gorai.002G1719) 0.016 0.030 0.521
20
83 2 (Cotton_A_14076) vs. 1 (Gorai.001G1484) 0.016 0.031 0.521
84 2 (Cotton_A_33069) vs. 1 (Gorai.006G0742) 0.066 0.127 0.520
85 2 (Cotton_A_13386) vs. 1 (Gorai.008G1951) 0.026 0.050 0.516
86 2 (Cotton_A_37189) vs. 1 (Gorai.005G1479) 0.017 0.032 0.515
87 2 (Cotton_A_25515) vs. 1 (Gorai.010G1802) 0.016 0.032 0.511
88 2 (Cotton_A_16155) vs. 1 (Gorai.011G0974) 0.012 0.023 0.511
89 2 (Cotton_A_15955) vs. 1 (Gorai.008G1022) 0.016 0.031 0.509
90ab 2 (Cotton_A_18522) vs. 1 (Gorai.013G0606) 0.075 0.148 0.509
91 2 (Cotton_A_27424) vs. 1 (Gorai.009G1101) 0.014 0.027 0.508
92 2 (Cotton_A_22374) vs. 1 (Gorai.013G1986) 0.012 0.023 0.506
93 2 (Cotton_A_04268) vs. 1 (Gorai.003G1421) 0.025 0.049 0.505
94 2 (Cotton_A_00752) vs. 1 (Gorai.005G2425) 0.019 0.038 0.503
95 2 (Cotton_A_40015) vs. 1 (Gorai.010G1439) 0.012 0.023 0.501
96 2 (Cotton_A_03759) vs. 1 (Gorai.007G1608) 0.010 0.020 0.501
97a 2 (Cotton_A_14743) vs. 1 (Gorai.006G0084) 0.018 0.037 0.498
98 2 (Cotton_A_23993) vs. 1 (Gorai.004G0731) 0.015 0.030 0.498
99 2 (Cotton_A_09794) vs. 1 (Gorai.003G0326) 0.016 0.032 0.491
100 2 (Cotton_A_21717) vs. 1 (Gorai.005G2322) 0.010 0.021 0.490
101 2 (Cotton_A_41153) vs. 1 (Gorai.013G1059) 0.017 0.035 0.490
102 2 (Cotton_A_09162) vs. 1 (Gorai.007G0312) 0.016 0.033 0.489
103 2 (Cotton_A_25415) vs. 1 (Gorai.009G4142) 0.016 0.032 0.488
104b 2 (Cotton_A_06370) vs. 1 (Gorai.012G1494) 0.012 0.024 0.486
105 2 (Cotton_A_37545) vs. 1 (Gorai.003G1355) 0.013 0.027 0.485
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
106 2 (Cotton_A_25609) vs. 1 (Gorai.005G1943) 0.024 0.049 0.483
107 2 (Cotton_A_01539) vs. 1 (Gorai.003G0072) 0.014 0.029 0.480
108 2 (Cotton_A_04493) vs. 1 (Gorai.008G1563) 0.014 0.030 0.479
109 2 (Cotton_A_33520) vs. 1 (Gorai.009G3754) 0.029 0.060 0.478
110 2 (Cotton_A_01088) vs. 1 (Gorai.009G0711) 0.019 0.039 0.477
111 2 (Cotton_A_08027) vs. 1 (Gorai.008G2080) 0.016 0.035 0.476
112 2 (Cotton_A_20227) vs. 1 (Gorai.008G1687) 0.011 0.023 0.476
113 2 (Cotton_A_07222) vs. 1 (Gorai.001G0292) 0.059 0.125 0.473
114 2 (Cotton_A_32576) vs. 1 (Gorai.008G0862) 0.016 0.033 0.472
115 2 (Cotton_A_13722) vs. 1 (Gorai.009G2054) 0.022 0.047 0.471
116 2 (Cotton_A_23893) vs. 1 (Gorai.003G1142) 0.038 0.080 0.471
117 2 (Cotton_A_12931) vs. 1 (Gorai.009G0926) 0.012 0.025 0.470
21
118 2 (Cotton_A_28094) vs. 1 (Gorai.005G1628) 0.017 0.035 0.469
119 2 (Cotton_A_39104) vs. 1 (Gorai.009G4026) 0.014 0.030 0.468
120 2 (Cotton_A_01590) vs. 1 (Gorai.003G0127) 0.047 0.101 0.468
121 2 (Cotton_A_14708) vs. 1 (Gorai.006G0114) 0.020 0.043 0.466
122 2 (Cotton_A_29057) vs. 1 (Gorai.012G0843) 0.013 0.028 0.465
123 2 (Cotton_A_00282) vs. 1 (Gorai.002G2674) 0.015 0.033 0.458
124 2 (Cotton_A_24369) vs. 1 (Gorai.007G2126) 0.013 0.028 0.455
125 2 (Cotton_A_36268) vs. 1 (Gorai.006G0702) 0.017 0.038 0.453
126 2 (Cotton_A_32770) vs. 1 (Gorai.007G1442) 0.011 0.023 0.452
127 2 (Cotton_A_28557) vs. 1 (Gorai.002G1203) 0.012 0.026 0.452
128 2 (Cotton_A_19260) vs. 1 (Gorai.008G2669) 0.014 0.032 0.451
129 2 (Cotton_A_26614) vs. 1 (Gorai.004G0254) 0.014 0.031 0.447
130 2 (Cotton_A_37026) vs. 1 (Gorai.008G0731) 0.045 0.100 0.445
131b 2 (Cotton_A_30160) vs. 1 (Gorai.011G1557) 0.078 0.175 0.444
132 2 (Cotton_A_28368) vs. 1 (Gorai.007G2806) 0.013 0.029 0.443
133 2 (Cotton_A_12602) vs. 1 (Gorai.004G0487) 0.018 0.041 0.441
134 2 (Cotton_A_17392) vs. 1 (Gorai.008G2642) 0.016 0.037 0.439
135 2 (Cotton_A_29863) vs. 1 (Gorai.012G0397) 0.011 0.024 0.438
136b 2 (Cotton_A_13296) vs. 1 (Gorai.001G1316) 0.135 0.311 0.433
137 2 (Cotton_A_26223) vs. 1 (Gorai.003G0959) 0.014 0.033 0.433
138 2 (Cotton_A_13509) vs. 1 (Gorai.010G1420) 0.013 0.031 0.426
139b 2 (Cotton_A_16278) vs. 1 (Gorai.010G0536) 0.018 0.041 0.426
140 2 (Cotton_A_10671) vs. 1 (Gorai.007G3063) 0.038 0.090 0.425
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
141 2 (Cotton_A_13298) vs. 1 (Gorai.001G1314) 0.010 0.024 0.423
142 2 (Cotton_A_38722) vs. 1 (Gorai.011G1213) 0.011 0.027 0.423
143 2 (Cotton_A_25573) vs. 1 (Gorai.009G1720) 0.017 0.040 0.422
144 2 (Cotton_A_21594) vs. 1 (Gorai.006G1964) 0.012 0.028 0.421
145 2 (Cotton_A_04706) vs. 1 (Gorai.009G1765) 0.014 0.032 0.420
146 2 (Cotton_A_17973) vs. 1 (Gorai.005G2043) 0.009 0.021 0.420
147b 2 (Cotton_A_34636) vs. 1 (Gorai.010G0722) 0.011 0.026 0.419
148 2 (Cotton_A_19585) vs. 1 (Gorai.005G0851) 0.016 0.038 0.419
149 2 (Cotton_A_28977) vs. 1 (Gorai.006G0322) 0.016 0.040 0.415
150 2 (Cotton_A_27278) vs. 1 (Gorai.010G2089) 0.020 0.049 0.415
151 2 (Cotton_A_16896) vs. 1 (Gorai.009G0068) 0.021 0.049 0.415
152 2 (Cotton_A_28444) vs. 1 (Gorai.005G1433) 0.013 0.033 0.413
22
153 2 (Cotton_A_03342) vs. 1 (Gorai.013G1342) 0.014 0.034 0.413
154 2 (Cotton_A_00872) vs. 1 (Gorai.013G0288) 0.012 0.028 0.413
155 2 (Cotton_A_10869) vs. 1 (Gorai.007G0604) 0.016 0.040 0.413
156 2 (Cotton_A_16777) vs. 1 (Gorai.013G1692) 0.023 0.056 0.411
157 2 (Cotton_A_35842) vs. 1 (Gorai.005G0550) 0.017 0.041 0.405
158 2 (Cotton_A_10841) vs. 1 (Gorai.009G0215) 0.011 0.028 0.405
159 2 (Cotton_A_26278) vs. 1 (Gorai.006G1122) 0.008 0.019 0.404
160 2 (Cotton_A_01069) vs. 1 (Gorai.009G0728) 0.011 0.028 0.403
161 2 (Cotton_A_09130) vs. 1 (Gorai.007G0282) 0.014 0.036 0.402
162 2 (Cotton_A_12265) vs. 1 (Gorai.007G1696) 0.017 0.043 0.401
163 2 (Cotton_A_18227) vs. 1 (Gorai.010G0016) 0.010 0.025 0.401
164 2 (Cotton_A_18697) vs. 1 (Gorai.001G2161) 0.017 0.043 0.400
165 2 (Cotton_A_11292) vs. 1 (Gorai.009G1643) 0.026 0.065 0.399
166 2 (Cotton_A_02403) vs. 1 (Gorai.007G0141) 0.010 0.025 0.399
167 2 (Cotton_A_30825) vs. 1 (Gorai.012G0828) 0.010 0.026 0.398
168 2 (Cotton_A_34712) vs. 1 (Gorai.009G3026) 0.016 0.040 0.397
169 2 (Cotton_A_16289) vs. 1 (Gorai.010G0526) 0.025 0.063 0.397
170a 2 (Cotton_A_37656) vs. 1 (Gorai.003G0508) 0.010 0.025 0.396
171a 2 (Cotton_A_03817) vs. 1 (Gorai.007G1554) 0.009 0.022 0.395
172 2 (Cotton_A_13567) vs. 1 (Gorai.008G2191) 0.019 0.049 0.392
173 2 (Cotton_A_36306) vs. 1 (Gorai.006G0750) 0.034 0.087 0.391
174 2 (Cotton_A_15492) vs. 1 (Gorai.007G0816) 0.017 0.043 0.390
175 2 (Cotton_A_26419) vs. 1 (Gorai.008G1310) 0.017 0.044 0.389
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
176 2 (Cotton_A_41044) vs. 1 (Gorai.005G1192) 0.014 0.035 0.388
177 2 (Cotton_A_32294) vs. 1 (Gorai.001G1527) 0.009 0.022 0.387
178 2 (Cotton_A_31588) vs. 1 (Gorai.012G1018) 0.008 0.021 0.386
179 2 (Cotton_A_27659) vs. 1 (Gorai.013G0860) 0.044 0.115 0.386
180 2 (Cotton_A_15598) vs. 1 (Gorai.010G2189) 0.014 0.036 0.383
181b 2 (Cotton_A_30368) vs. 1 (Gorai.009G2580) 0.010 0.026 0.380
182b 2 (Cotton_A_24432) vs. 1 (Gorai.011G1512) 0.017 0.046 0.378
183 2 (Cotton_A_05741) vs. 1 (Gorai.004G2583) 0.009 0.025 0.378
184 2 (Cotton_A_30003) vs. 1 (Gorai.001G1384) 0.013 0.034 0.377
185 2 (Cotton_A_04299) vs. 1 (Gorai.005G0022) 0.009 0.024 0.376
186 2 (Cotton_A_02560) vs. 1 (Gorai.013G2504) 0.016 0.043 0.372
187 2 (Cotton_A_35950) vs. 1 (Gorai.002G2403) 0.019 0.052 0.371
23
188ab 2 (Cotton_A_08373) vs. 1 (Gorai.005G0470) 0.013 0.035 0.370
189 2 (Cotton_A_26200) vs. 1 (Gorai.003G0982) 0.015 0.042 0.370
190 2 (Cotton_A_17963) vs. 1 (Gorai.005G2054) 0.007 0.018 0.370
191 2 (Cotton_A_17735) vs. 1 (Gorai.008G2697) 0.017 0.047 0.369
192b 2 (Cotton_A_02057) vs. 1 (Gorai.008G0443) 0.018 0.048 0.368
193 2 (Cotton_A_30072) vs. 1 (Gorai.006G0901) 0.013 0.034 0.364
194 2 (Cotton_A_31804) vs. 1 (Gorai.004G1641) 0.018 0.051 0.363
195 2 (Cotton_A_26302) vs. 1 (Gorai.004G1943) 0.009 0.024 0.362
196 2 (Cotton_A_39558) vs. 1 (Gorai.009G0359) 0.010 0.027 0.362
197 2 (Cotton_A_40005) vs. 1 (Gorai.001G1983) 0.011 0.031 0.361
198 2 (Cotton_A_10110) vs. 1 (Gorai.011G2952) 0.021 0.058 0.360
199 2 (Cotton_A_38063) vs. 1 (Gorai.006G0458) 0.011 0.031 0.359
200 2 (Cotton_A_34312) vs. 1 (Gorai.001G1555) 0.054 0.150 0.357
201 2 (Cotton_A_31539) vs. 1 (Gorai.012G0855) 0.016 0.045 0.355
202 2 (Cotton_A_35528) vs. 1 (Gorai.009G0027) 0.012 0.034 0.355
203 2 (Cotton_A_35532) vs. 1 (Gorai.009G0032) 0.009 0.026 0.354
204 2 (Cotton_A_20411) vs. 1 (Gorai.009G3837) 0.010 0.028 0.353
205 2 (Cotton_A_23909) vs. 1 (Gorai.005G0195) 0.103 0.296 0.349
206 2 (Cotton_A_38340) vs. 1 (Gorai.010G1312) 0.016 0.047 0.348
207 2 (Cotton_A_07872) vs. 1 (Gorai.008G2292) 0.017 0.049 0.347
208 2 (Cotton_A_19584) vs. 1 (Gorai.005G0852) 0.012 0.036 0.346
209 2 (Cotton_A_23809) vs. 1 (Gorai.005G1813) 0.012 0.035 0.346
210 2 (Cotton_A_00454) vs. 1 (Gorai.002G2491) 0.014 0.041 0.343
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
211 2 (Cotton_A_10325) vs. 1 (Gorai.001G0085) 0.008 0.023 0.342
212 2 (Cotton_A_34264) vs. 1 (Gorai.004G1072) 0.011 0.033 0.341
213 2 (Cotton_A_37646) vs. 1 (Gorai.008G0725) 0.011 0.031 0.340
214 2 (Cotton_A_00653) vs. 1 (Gorai.005G2529) 0.016 0.048 0.340
215 2 (Cotton_A_24627) vs. 1 (Gorai.011G1119) 0.017 0.050 0.340
216 2 (Cotton_A_16144) vs. 1 (Gorai.009G2412) 0.010 0.029 0.339
217a 2 (Cotton_A_02931) vs. 1 (Gorai.011G0480) 0.013 0.039 0.337
218 2 (Cotton_A_35522) vs. 1 (Gorai.009G1913) 0.006 0.018 0.336
219 2 (Cotton_A_25881) vs. 1 (Gorai.008G1273) 0.013 0.038 0.336
220 2 (Cotton_A_10533) vs. 1 (Gorai.013G2656) 0.018 0.054 0.334
221 2 (Cotton_A_09637) vs. 1 (Gorai.006G2666) 0.012 0.035 0.333
222 2 (Cotton_A_01123) vs. 1 (Gorai.009G0676) 0.011 0.034 0.333
24
223 2 (Cotton_A_24301) vs. 1 (Gorai.004G2353) 0.013 0.039 0.330
224 2 (Cotton_A_05860) vs. 1 (Gorai.001G2321) 0.023 0.071 0.329
225 2 (Cotton_A_26945) vs. 1 (Gorai.009G3291) 0.009 0.027 0.329
226 2 (Cotton_A_12830) vs. 1 (Gorai.013G1878) 0.012 0.037 0.328
227 2 (Cotton_A_15541) vs. 1 (Gorai.013G1581) 0.014 0.042 0.328
228 2 (Cotton_A_06044) vs. 1 (Gorai.012G0114) 0.010 0.031 0.328
229 2 (Cotton_A_07988) vs. 1 (Gorai.008G2109) 0.015 0.045 0.327
230 2 (Cotton_A_17092) vs. 1 (Gorai.009G1232) 0.012 0.037 0.327
231 2 (Cotton_A_31527) vs. 1 (Gorai.012G0866) 0.010 0.032 0.325
232 2 (Cotton_A_15473) vs. 1 (Gorai.007G0798) 0.009 0.028 0.321
233 2 (Cotton_A_09139) vs. 1 (Gorai.007G0293) 0.036 0.112 0.320
234 2 (Cotton_A_32227) vs. 1 (Gorai.010G2146) 0.010 0.031 0.320
235 2 (Cotton_A_09931) vs. 1 (Gorai.004G2035) 0.013 0.042 0.318
236 2 (Cotton_A_23072) vs. 1 (Gorai.011G1462) 0.010 0.030 0.317
237 2 (Cotton_A_32556) vs. 1 (Gorai.006G2453) 0.013 0.042 0.317
238 2 (Cotton_A_27318) vs. 1 (Gorai.013G2436) 0.013 0.041 0.316
239 2 (Cotton_A_33638) vs. 1 (Gorai.010G1884) 0.009 0.029 0.315
240 2 (Cotton_A_21338) vs. 1 (Gorai.008G0551) 0.008 0.025 0.314
241 2 (Cotton_A_25940) vs. 1 (Gorai.006G0534) 0.011 0.036 0.312
242 2 (Cotton_A_21752) vs. 1 (Gorai.002G2007) 0.010 0.033 0.307
243 2 (Cotton_A_01226) vs. 1 (Gorai.009G0573) 0.011 0.036 0.306
244 2 (Cotton_A_30013) vs. 1 (Gorai.001G1393) 0.017 0.057 0.304
245 2 (Cotton_A_28264) vs. 1 (Gorai.012G1386) 0.011 0.037 0.303
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
246 2 (Cotton_A_09578) vs. 1 (Gorai.001G1213) 0.010 0.032 0.302
247 2 (Cotton_A_18311) vs. 1 (Gorai.012G0512) 0.008 0.025 0.302
248 2 (Cotton_A_32575) vs. 1 (Gorai.008G0863) 0.010 0.034 0.301
249 2 (Cotton_A_17786) vs. 1 (Gorai.002G0314) 0.018 0.059 0.300
250 2 (Cotton_A_27064) vs. 1 (Gorai.006G0233) 0.016 0.054 0.300
251 2 (Cotton_A_01162) vs. 1 (Gorai.009G0634) 0.007 0.022 0.299
252b 2 (Cotton_A_03417) vs. 1 (Gorai.012G1205) 0.010 0.032 0.297
253 2 (Cotton_A_08971) vs. 1 (Gorai.005G2117) 0.011 0.036 0.295
254 2 (Cotton_A_13392) vs. 1 (Gorai.008G1947) 0.011 0.037 0.289
255 2 (Cotton_A_18561) vs. 1 (Gorai.003G1318) 0.015 0.051 0.287
256 2 (Cotton_A_18120) vs. 1 (Gorai.007G2674) 0.009 0.031 0.286
257 2 (Cotton_A_04319) vs. 1 (Gorai.005G0042) 0.018 0.064 0.284
25
258 2 (Cotton_A_14721) vs. 1 (Gorai.006G0103) 0.013 0.044 0.284
259 2 (Cotton_A_30439) vs. 1 (Gorai.013G1427) 0.009 0.033 0.281
260 2 (Cotton_A_14702) vs. 1 (Gorai.006G0119) 0.009 0.031 0.280
261 2 (Cotton_A_13590) vs. 1 (Gorai.008G2169) 0.018 0.067 0.277
262 2 (Cotton_A_22790) vs. 1 (Gorai.009G3664) 0.019 0.071 0.274
263 2 (Cotton_A_38720) vs. 1 (Gorai.008G0933) 0.011 0.039 0.273
264 2 (Cotton_A_27473) vs. 1 (Gorai.004G0668) 0.023 0.085 0.273
265 2 (Cotton_A_36351) vs. 1 (Gorai.007G2493) 0.012 0.043 0.273
266b 2 (Cotton_A_23070) vs. 1 (Gorai.011G1464) 0.008 0.028 0.273
267 2 (Cotton_A_10904) vs. 1 (Gorai.013G1892) 0.007 0.027 0.269
268 2 (Cotton_A_22715) vs. 1 (Gorai.001G2759) 0.013 0.047 0.269
269 2 (Cotton_A_19225) vs. 1 (Gorai.001G0393) 0.010 0.038 0.268
270 2 (Cotton_A_37812) vs. 1 (Gorai.005G1369) 0.011 0.040 0.266
271b 2 (Cotton_A_23068) vs. 1 (Gorai.011G1466) 0.011 0.042 0.266
272 2 (Cotton_A_14943) vs. 1 (Gorai.012G1311) 0.014 0.052 0.266
273 2 (Cotton_A_34606) vs. 1 (Gorai.005G0815) 0.013 0.050 0.264
274 2 (Cotton_A_37555) vs. 1 (Gorai.007G3444) 0.016 0.062 0.264
275 2 (Cotton_A_18626) vs. 1 (Gorai.001G0621) 0.011 0.041 0.262
276 2 (Cotton_A_36364) vs. 1 (Gorai.013G1093) 0.009 0.033 0.261
277 2 (Cotton_A_00058) vs. 1 (Gorai.002G0209) 0.013 0.051 0.260
278 2 (Cotton_A_15782) vs. 1 (Gorai.008G0143) 0.010 0.038 0.260
279 2 (Cotton_A_07841) vs. 1 (Gorai.009G1982) 0.008 0.029 0.259
280 2 (Cotton_A_23711) vs. 1 (Gorai.006G0946) 0.009 0.035 0.258
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
281 2 (Cotton_A_07488) vs. 1 (Gorai.011G0852) 0.011 0.042 0.256
282 2 (Cotton_A_14663) vs. 1 (Gorai.007G2016) 0.007 0.029 0.256
283 2 (Cotton_A_31974) vs. 1 (Gorai.008G1999) 0.013 0.051 0.256
284b 2 (Cotton_A_16907) vs. 1 (Gorai.009G0058) 0.004 0.017 0.255
285 2 (Cotton_A_15028) vs. 1 (Gorai.007G2884) 0.012 0.046 0.255
286 2 (Cotton_A_23793) vs. 1 (Gorai.005G1802) 0.012 0.048 0.255
287 2 (Cotton_A_28198) vs. 1 (Gorai.006G0256) 0.029 0.114 0.255
288 2 (Cotton_A_16255) vs. 1 (Gorai.009G0096) 0.009 0.037 0.253
289 2 (Cotton_A_04367) vs. 1 (Gorai.005G0092) 0.014 0.055 0.253
290 2 (Cotton_A_41137) vs. 1 (Gorai.012G0916) 0.008 0.031 0.252
291 2 (Cotton_A_30639) vs. 1 (Gorai.002G2377) 0.013 0.051 0.252
292 2 (Cotton_A_22070) vs. 1 (Gorai.006G0131) 0.009 0.035 0.250
26
293 2 (Cotton_A_07731) vs. 1 (Gorai.007G0907) 0.011 0.042 0.249
294 2 (Cotton_A_32048) vs. 1 (Gorai.009G1272) 0.009 0.037 0.244
295 2 (Cotton_A_18858) vs. 1 (Gorai.010G0754) 0.010 0.043 0.244
296 2 (Cotton_A_39875) vs. 1 (Gorai.004G1105) 0.011 0.045 0.241
297 2 (Cotton_A_02968) vs. 1 (Gorai.007G1523) 0.008 0.034 0.241
298 2 (Cotton_A_15244) vs. 1 (Gorai.008G0378) 0.011 0.046 0.239
299a 2 (Cotton_A_06850) vs. 1 (Gorai.010G0097) 0.012 0.050 0.238
300 2 (Cotton_A_34461) vs. 1 (Gorai.013G1092) 0.011 0.047 0.237
301b 2 (Cotton_A_28288) vs. 1 (Gorai.010G2281) 0.012 0.050 0.236
302 2 (Cotton_A_00777) vs. 1 (Gorai.005G2405) 0.010 0.043 0.235
303 2 (Cotton_A_00501) vs. 1 (Gorai.002G2447) 0.011 0.048 0.234
304a 2 (Cotton_A_27681) vs. 1 (Gorai.008G1926) 0.011 0.046 0.232
305 2 (Cotton_A_01429) vs. 1 (Gorai.008G2890) 0.009 0.041 0.230
306 2 (Cotton_A_36910) vs. 1 (Gorai.008G0795) 0.007 0.032 0.230
307 2 (Cotton_A_13066) vs. 1 (Gorai.006G1904) 0.015 0.064 0.228
308 2 (Cotton_A_22059) vs. 1 (Gorai.008G1383) 0.009 0.040 0.228
309a 2 (Cotton_A_17619) vs. 1 (Gorai.006G0833) 0.014 0.062 0.228
310 2 (Cotton_A_11411) vs. 1 (Gorai.008G0595) 0.006 0.027 0.226
311 2 (Cotton_A_10175) vs. 1 (Gorai.002G2387) 0.009 0.040 0.225
312 2 (Cotton_A_30897) vs. 1 (Gorai.005G1925) 0.009 0.040 0.224
313 2 (Cotton_A_25648) vs. 1 (Gorai.011G0157) 0.013 0.059 0.223
314 2 (Cotton_A_17704) vs. 1 (Gorai.008G0480) 0.007 0.031 0.221
315 2 (Cotton_A_39243) vs. 1 (Gorai.001G1919) 0.010 0.045 0.220
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
316 2 (Cotton_A_20788) vs. 1 (Gorai.011G0198) 0.006 0.029 0.220
317 2 (Cotton_A_09744) vs. 1 (Gorai.004G1816) 0.008 0.036 0.217
318 2 (Cotton_A_16921) vs. 1 (Gorai.009G0043) 0.009 0.042 0.216
319 2 (Cotton_A_11129) vs. 1 (Gorai.009G0222) 0.008 0.038 0.216
320 2 (Cotton_A_02867) vs. 1 (Gorai.009G4208) 0.008 0.038 0.214
321a 2 (Cotton_A_00514) vs. 1 (Gorai.002G2433) 0.012 0.057 0.211
322 2 (Cotton_A_10654) vs. 1 (Gorai.007G3077) 0.007 0.034 0.210
323 2 (Cotton_A_06403) vs. 1 (Gorai.012G1462) 0.010 0.048 0.209
324 2 (Cotton_A_06599) vs. 1 (Gorai.007G3630) 0.008 0.040 0.209
325b 2 (Cotton_A_06512) vs. 1 (Gorai.002G1010) 0.009 0.046 0.204
326 2 (Cotton_A_13417) vs. 1 (Gorai.006G1548) 0.007 0.033 0.201
327 2 (Cotton_A_14089) vs. 1 (Gorai.001G1496) 0.009 0.044 0.200
27
328 2 (Cotton_A_06029) vs. 1 (Gorai.012G0133) 0.009 0.044 0.199
329d 2 (Cotton_A_09201) vs. 1 (Gorai.007G0350) 0.844d 4.238d 0.199
330 2 (Cotton_A_00336) vs. 1 (Gorai.002G2612) 0.007 0.034 0.198
331 2 (Cotton_A_00413) vs. 1 (Gorai.002G2536) 0.010 0.049 0.198
332 2 (Cotton_A_11971) vs. 1 (Gorai.004G0145) 0.017 0.087 0.196
333 2 (Cotton_A_23841) vs. 1 (Gorai.011G2097) 0.007 0.036 0.196
334 2 (Cotton_A_32763) vs. 1 (Gorai.001G1035) 0.008 0.040 0.195
335b 2 (Cotton_A_15714) vs. 1 (Gorai.013G0109) 0.009 0.044 0.194
336 2 (Cotton_A_10104) vs. 1 (Gorai.011G2958) 0.011 0.057 0.191
337 2 (Cotton_A_03168) vs. 1 (Gorai.013G2242) 0.012 0.063 0.190
338 2 (Cotton_A_30406) vs. 1 (Gorai.002G0333) 0.008 0.045 0.187
339 2 (Cotton_A_26582) vs. 1 (Gorai.008G2983) 0.006 0.034 0.186
340 2 (Cotton_A_34750) vs. 1 (Gorai.010G1466) 0.009 0.048 0.184
341 2 (Cotton_A_01310) vs. 1 (Gorai.008G2774) 0.008 0.042 0.182
342 2 (Cotton_A_28304) vs. 1 (Gorai.010G2266) 0.012 0.064 0.179
343 2 (Cotton_A_17970) vs. 1 (Gorai.005G2046) 0.006 0.032 0.177
344 2 (Cotton_A_19296) vs. 1 (Gorai.005G1892) 0.010 0.057 0.176
345b 2 (Cotton_A_11911) vs. 1 (Gorai.004G2907) 0.006 0.035 0.174
346 2 (Cotton_A_14952) vs. 1 (Gorai.012G1318) 0.010 0.061 0.171
347 2 (Cotton_A_05786) vs. 1 (Gorai.004G2627) 0.007 0.041 0.170
348 2 (Cotton_A_21706) vs. 1 (Gorai.005G2333) 0.008 0.050 0.170
349 2 (Cotton_A_34969) vs. 1 (Gorai.007G3347) 0.010 0.062 0.168
350 2 (Cotton_A_23465) vs. 1 (Gorai.009G2221) 0.008 0.049 0.168
Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)
Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS
351 2 (Cotton_A_23222) vs. 1 (Gorai.007G1207) 0.011 0.063 0.167
352 2 (Cotton_A_22065) vs. 1 (Gorai.008G1377) 0.011 0.068 0.163
353 2 (Cotton_A_11026) vs. 1 (Gorai.009G0768) 0.007 0.044 0.159
354 2 (Cotton_A_08048) vs. 1 (Gorai.008G2059) 0.006 0.041 0.153
355 2 (Cotton_A_00793) vs. 1 (Gorai.005G2375) 0.010 0.069 0.152
356b 2 (Cotton_A_26149) vs. 1 (Gorai.009G1519) 0.008 0.050 0.151
357 2 (Cotton_A_07893) vs. 1 (Gorai.008G2279) 0.006 0.039 0.150
358 2 (Cotton_A_01507) vs. 1 (Gorai.003G0040) 0.006 0.037 0.147
359 2 (Cotton_A_30269) vs. 1 (Gorai.006G1151) 0.006 0.042 0.137
360 2 (Cotton_A_09295) vs. 1 (Gorai.007G2759) 0.006 0.042 0.133
361 2 (Cotton_A_00464) vs. 1 (Gorai.002G2481) 0.008 0.058 0.132
362 2 (Cotton_A_12077) vs. 1 (Gorai.006G1689) 0.004 0.034 0.131
28
363 2 (Cotton_A_16188) vs. 1 (Gorai.011G0940) 0.005 0.035 0.130
364 2 (Cotton_A_16572) vs. 1 (Gorai.006G0758) 0.007 0.058 0.127
365b 2 (Cotton_A_08850) vs. 1 (Gorai.012G1593) 0.004 0.033 0.124
366 2 (Cotton_A_04919) vs. 1 (Gorai.010G2491) 0.004 0.040 0.111
367 2 (Cotton_A_34884) vs. 1 (Gorai.010G0323) 0.005 0.049 0.104
368 2 (Cotton_A_21237) vs. 1 (Gorai.011G2051) 0.002 0.020 0.100
369 2 (Cotton_A_27764) vs. 1 (Gorai.003G1285) 0.004 0.040 0.097
370 2 (Cotton_A_02937) vs. 1 (Gorai.011G0486) 0.004 0.041 0.096
371 2 (Cotton_A_29102) vs. 1 (Gorai.008G0687) 0.004 0.038 0.091
372 2 (Cotton_A_19432) vs. 1 (Gorai.001G2565) 0.004 0.050 0.084
373a 2 (Cotton_A_13069) vs. 1 (Gorai.006G1901) 0.003 0.043 0.063
374 2 (Cotton_A_11956) vs. 1 (Gorai.004G0129) 0.002 0.035 0.048
375a 2 (Cotton_A_28832) vs. 1 (Gorai.002G1171) 0.002 0.047 0.040
376b 2 (Cotton_A_34882) vs. 1 (Gorai.010G0325) 0.002 0.045 0.034
377 2 (Cotton_A_36582) vs. 1 (Gorai.009G3548) 0.000 0.021 0.000
Note:
a A PPR candidate sequence derived from A2 genome existed in this pair of homologous sequences;
b A PPR candidate sequence derived from D5 genome existed in this pair of homologous sequences;
c This pair of homologous sequences owned the maximum value of dN/dS(shown in bold and underline fonts);
d This pair of homologous sequences owned the maximum value of dN and dS(shown in bold and underline fonts)
Additional file 4: Figure S3. GO annotation of PPR homologous genes between G.
raimondii (D5) and G. arboreum (A2) genomes. (a) GO bar chart of D5-A2 PPR homologies in
secondary level GO terms of 3 main GO categories (biological process, cellular component
and molecular function). Input list, D5-A2 PPR homologous sequences. Background/reference,
cotton genome locus (phytozome). (b) GO hieratical graph of D5-A2 PPR homologies for
biological process. The more significant statistically, the darker the note color was. (c) GO
hieratical graph of D5-A2 PPR homologies for cellular component. The more significant
statistically, the darker the note color was.
29
Additional file 5: Table S4. Sub-family analysis of 70 PPR candidate genes in G. raimondii
(D5), G. arboreum (A2) and G. hirsutum (AD1) genomes. Table S5. Sub-family analysis of 8
PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum (AD1) genomes. Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum
(AD1) genomes
Chromos
ome Genea
Subfam
ily
No. of
amino acids
No. of
Motif Motif arrangementb
Gossypium raimondii
Chr01 Gorai.001G1
316 P 419 12 8-P-P-P-P-P-P-P-P-P-P-P-P-6
Chr02 Gorai.002G0
718(1) P 787 18
63-P-22-P-32-P-35-P-P-P-P-P-P-P-P-P-P-P-4-P-P-P-
P-13
Chr02 Gorai.002G0
718(2) P 787 18
63-P-22-P-32-P-35-P-P-P-P-P-P-P-P-P-P-P-4-P-P-P-
P-13
Chr02 Gorai.002G1
010 P 1063 25
84-P-P-P-P-P-P-P-P-3-P-P-P-P-P-P-6-P-P-1-P-3-P-P-
P-P-39-P-5-P-P-P-68
Chr03 Gorai.003G1
716 P 598 13 115-P-P-P-P-P-P-S-5-P-P-P-1-P-P-P-28
Chr04 Gorai.004G2
406 P 509 12 69-P-P-P-P-3-P-P-P-P-P-P-P-P-21
Chr04 Gorai.004G2
438 P 647 15 101-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21
Chr04 Gorai.004G2
907 P 584 11 19-P-35-P-P-3-P-P-P-P-P-P-P-P-144
Chr06 Gorai.006G2
252 P 508 4 144-P-P-1-P-P-227
Chr06 Gorai.006G2
471 P 638 14 106-P-P-P-P-P-P-P-S-4-P-P-3-P-3-P-P-P-41
Chr07 Gorai.007G1
431 P 366 9 30-P-P-P-P-P-P-2-P-P-P-35
Chr08 Gorai.008G0 P 631 13 152-P-1-P-P-P-P-P-P-P-P-P-P-P-P-25
30
443
Chr09 Gorai.009G0
058 P 716 15 167-P-P-P-P-P-2-P-P-P-P-P-P-P-P-P-P-27
Chr09 Gorai.009G1
519 P 692 14 141-P-P-P-P-P-P-3-P-P-P-3-P-P-P-P-33-P-35
Chr09 Gorai.009G2
580 P 632 12 86-P-43-P-P-P-P-P-P-P-P-P-P-5-P-87
Chr09 Gorai.009G3
762 P 435 6 166-P-P-8-P-P-P-P
Chr10 Gorai.010G0
325(1) P 595 14 105-P-P-P-2-P-P-P-P-P-P-P-P-P-P-P-11
Chr10 Gorai.010G0
325(2) P 554 14 64-P-P-P-2-P-P-P-P-P-P-P-P-P-P-P-11
Chr10 Gorai.010G0
325(3) P 595 14 105-P-P-P-2-P-P-P-P-P-P-P-P-P-P-P-11
Chr10 Gorai.010G0
536(1) P 722 14 117-P-36-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46
Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G.
hirsutum (AD1) genomes (continued)
Chromos
ome Genea
Subfam
ily
No. of
amino acids
No. of
Motif Motif arrangementb
Chr10 Gorai.010G0
536(2) P 646 14 41-P-37-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46
Chr10 Gorai.010G0
722(1) P 632 10 110-P-P-P-4-P-P-P-5-P-P-P-P-136
Chr10 Gorai.010G0
722(2) P 632 10 110-P-P-P-4-P-P-P-5-P-P-P-P-136
Chr10 Gorai.010G2
281(1) P 638 14 122-P-5-P-P-1-P-P-P-S-4-P-P-1-P-2-P-P-P-P-24
Chr10 Gorai.010G2
281(2) P 638 14 122-P-5-P-P-1-P-P-P-S-4-P-P-1-P-2-P-P-P-P-24
Chr11 Gorai.011G1
450(1) P 586 15 68-P-P-P-P-P-P-P-P-P-P-P-P-P-P-2-P
Chr11 Gorai.011G1
451(1) P 558 14 47-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21
Chr11 Gorai.011G1
464(1) P 519 13 43-P-P-P-P-P-P-P-P-1-P-P-P-P-P-21
Chr11 Gorai.011G1
464(2) P 367 10 9-P-P-P-P-P-1-P-P-P-P-P-8
Chr11 Gorai.011G1
465 P 93 3 8-P-P-P
31
Chr11 Gorai.011G1
466 P 371 8 99-P-P-P-P-P-P-P-P-2
Chr11 Gorai.011G1
511 P 73 1 25-P-13
Chr11 Gorai.011G1
512 P 626 15 54-P-35-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21
Chr11 Gorai.011G1
514 P 442 10 113-P-P-P-P-P-P-P-P-P-P
Chr11 Gorai.011G1
515 P 824 20
65-P-P-P-P-P-P-P-P-2-P-P-P-P-2-P-1-P-63-P-P-P-P-
P-P-1
Chr11 Gorai.011G1
557 P 536 13 67-P-1-P-P-P-P-P-P-P-19-P-P-P-2-P-P-17
Chr12 Gorai.012G0
303(1) P 524 10 136-P-P-P-1-P-P-P-P-P-17-P-P-20
Chr12 Gorai.012G0
303(2) P 431 9 44-P-62-P-P-P-P-P-P-P-P-14
Chr12 Gorai.012G1
205(1) P 763 14 193-P-P-P-P-P-P-P-P-P-P-P-P-P-P-84
Chr12 Gorai.012G1
205(2) P 763 14 193-P-P-P-P-P-P-P-P-P-P-P-P-P-P-84
Chr12 Gorai.012G1
494(1) P 536 9 168-P-P-P-P-P-P-P-P-1-P-55
Chr12 Gorai.012G1
494(2) P 384 9 16-P-P-P-P-P-P-P-P-1-P-55
Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G.
hirsutum (AD1) genomes (continued)
Chromos
ome Genea
Subfam
ily
No. of
amino acids
No. of
Motif Motif arrangementb
Chr12 Gorai.012G1
593(1) P 868 18
72-S-6-P-P-P-P-P-P-P-P-P-P-P-25-P-P-P-P-P-45-P-1
06
Chr12 Gorai.012G1
593(2) P 868 18
72-S-6-P-P-P-P-P-P-P-P-P-P-P-25-P-P-P-P-P-45-P-1
06
Chr12 Gorai.012G1
593(3) P 868 18
72-S-6-P-P-P-P-P-P-P-P-P-P-P-25-P-P-P-P-P-45-P-1
06
Chr13 Gorai.013G0
109(1) P 960 15 316-P-P-P-2-P-1-P-P-P-P-P-P-P-P-P-P-P-123
Chr13 Gorai.013G0
109(2) P 755 12 316-P-P-P-2-P-1-P-P-P-P-P-P-P-P-23
Chr13 Gorai.013G0
606(1) P 555 13 43-P-P-35-P-P-P-P-P-P-P-P-2-P-P-P-22
Chr13 Gorai.013G0 P 416 11 9-P-P-P-P-P-P-P-P-2-P-P-P-22
32
606(2)
Gossypium arboreum
Chr01 Cotton_A_32
157 P 195 5 9-P-P-P-P-P-13
Chr03 Cotton_A_26
557 P 440 11 8-P-4-P-P-P-P-P-P-P-2-P-P-P-45
Chr03 Cotton_A_03
817 P 523 9 153-S-4-P-3-P-P-4-P-P-2-P-35-P-P-24
Chr04 Cotton_A_06
850 P 704 7 368-P-S-5-P-35-P-P-P-P-21
Chr05 Cotton_A_26
837 P 415 11 8-P-P-P-P-P-P-P-P-P-P-P-22
Chr06 Cotton_A_04
606 P 566 8 169-P-P-34-P-P-P-P-35-P-P-48
Chr06 Cotton_A_29
300 P 587 14 54-P-35-P-P-1-P-P-P-P-P-P-P-P-P-P-P-21
Chr07 Cotton_A_16
847 P 573 13 103-P-P-P-P-P-3-P-P-P-P-P-P-P-P-12
Chr07 Cotton_A_18
522 P 622 14 111-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21
Chr08 Cotton_A_02
931 P 734 14 173-P-P-P-P-3-P-P-P-P-P-3-P-P-P-P-36-P-39
Chr09 Cotton_A_27
681 P 759 19 38-P-9-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-4-P-P-P-54
Chr10 Cotton_A_13
069 P 466 8 130-S-3-P-P-5-P-P-P-P-P-54
Chr10 Cotton_A_17
619 P 398 6 175-P-P-P-P-P-P-16
Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G.
hirsutum (AD1) genomes (continued)
Chromos
ome Genea
Subfam
ily
No. of
amino acids No. of Motif Motif arrangementb
Chr10 Cotton_A_14
743 P 544 14 8-P-35-P-P-P-P-P-P-P-P-P-P-P-P-P-21
Chr12 Cotton_A_37
656 P 1525 18
239-P-37-P-P-P-P-P-P-P-72-P-36-P-1-S-41-P-41-
P-P-35-P-1-P-P-35-P-376
Chr12 Cotton_A_28
832 P 706 14 165-P-3-P-41-P-1-P-1-P-P-P-P-P-P-P-P-P-P-21
Chr12 Cotton_A_00
514 P 817 20
108-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-1-P-1
-P-12
Gossypium hirsutum
33
GhBah0036h0
9 P 480 8 104-S-42-P-41-P-P-P-P-P-1-P-24
GhDeg5330 P 471 9 90-S-5-S-4-P-P-4-P-P-P-P-P-66
GhI12 P 458 7 118-P-36-P-P-P-P-P-4-P-59
GhK14 P 846 16 273-P-P-P-P-P-P-3-P-P-P-P-P-3-P-P-P-P-P-13
GhPPR3 P 547 10 155-P-2-P-1-P-P-P-P-1-P-P-5-P-1-P-42
GhPPR4 P 337 4 175-P-P-P-P-22
GhPPR5 P 288 7 11-P-P-2-P-P-2-P-P-5-P-29
GhPPRH1 P 638 17 P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-1-P-47
GhPPRH2 P 851 16 237-P-P-P-P-P-P-P-P-P-P-P-35-P-P-P-P-P-21
GhMX55E05 E 522 12 87-L-7-S-3-P-L-S-P-L-S-P-L2-S-9-E
GhPPR1 E 532 11 73-P-L-S-S-P-4-L-S-P-L2-S-4-E-46
GhCRR4 E+ 637 15
85-P-L-S-S-2-S-S-S-P-4-L-2-S-P-L2-1-S-4-E-E
+-12
Gh155c17
D
Y
W
875 21 65-P-2-L-2-S-P-L-S-P-L-1-S-P-L-4-S-P-2-L-S-
P-L2-S-5-E-E+-DYW
GhMX089E0
3
D
Y
W
775 19 32-S-P-L-S-P-L-S-P-L-S-P-3-L-1-S-P-L2-S-4-E
-E+-DYW
GhPPR2
D
Y
W
592 11 116-L-1-S-P-L-S-P-L2-S-7-E-E+-DYW
Note: aThe two or more transcipts of the same gene were distinguished by different numbers; bThe number in motif arrangement
represented the number of amino acids between two adjacent motifs.
Table S5 Sub-family analysis of 8 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum
(AD1) genomes
Gene Subfamily No.
of
amino
acids
Motif
number
Motif arrangement
Gorai.006G2471 P 638 14 106-P-P-P-P-P-P-P-S-4-P-P-3-P-3-P-P-P-41
Gorai.005G0470 P 641 14 124-P-P-P-P-P-P-P-P-P-3-P-1-P-P-P-P-29
Cotton_A_08373 P 643 14 124-P-P-P-P-P-P-P-P-P-P-1-P-P-P-P-29
Cotton_A_26557 P 440 11 8-P-4-P-P-P-P-P-P-P-2-P-P-P-45
Gorai.007G1431 P 366 9 30-P-P-P-P-P-P-2-P-P-P-35
PhRf_PPR592 P 592 13 44-P-70-P-3-P-P-P-P-P-P-P-P-P-P-P-33
OsRf1a P 791 17 87-P-P-P-6-P-P-P-P-P-P-P-1-P-P-P-P-P-2-P-4-P-107
34
OsRf1b P 506 12 28-P-37-P-1-P-P-2-P-P-P-P-P-P-P-P-30
GhPPR3 P 547 10 155-P-2-P-1-P-P-P-P-1-P-P-5-P-1-P-42
GhK14 P 846 16 273-P-P-P-P-P-P-3-P-P-P-P-P-3-P-P-P-P-P-13
Gorai.010G0536(1) P 646 14 41-P-37-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46
Gorai.010G0536(2) P 722 14 117-P-36-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46
BnPPR_B_L1 P 667 15 109-P-P-1-P-P-4-P-P-P-P-P-P-P-11-P-P-P-P-25
RsRfo(PPR_B) P 687 16 51-P-35-P-P-P-P-3-P-P-P-P-P-P-P-13-P-P-P-P-38
AtRPF1 P 602 13 121-P-P-P-P-P-P-P-P-P-P-P-P-P-26
Additional file 6: Table S6. List of primers used for qRT-PCR.
Table S6 List of primers used for qRT-PCR
Gene Sequence (5' to 3') Tm (℃) Length (bp)
Gorai. 005g0470 F: TGGTCAGTCTCCAGCGTTATCTACA 62.0 25
R: GTATGCTGAAATGCTCAATGCTCG 60.3 24
Gorai. 006g2471 F: GAGCCTGATTACGCTACTCTTGG 62.0 23
R: AAAACATCACCTTGAAACCCTCTT 56.8 24
Gorai. 007g1431 F: GAGAAGTTGGAAGAAGCGAATCAGTT 60.4 26
R: CTTACCAGCCAAGCAATACCCATC 62.0 24
Gorai. 010g0536 F: CATTGATGGGAAACCAACCGTG 60.1 22
R: GTGGATGCAACTGGTGGAGGAC 63.8 22
Cotton_A_26557 F: AGGCAGGAAAGGTTGACGAAGC 61.9 22
R: CCAGTGCCTCTGAGTCACAATCG 63.7 23
Cotton_A_08373 F: TTCCAAGAAGGGCAAGTGAGC 60.0 21
R: ATCAAAAGCCTCCTCAATGTGG 58.2 22
GhPPR3 F: TTTGTTGAGGTTAGACGAGGTTTAC 58.7 25
R: TCATACTTCTTCGCCTTACAATACG 58.7 25
GhK14 F: TCTCTCCTAACAATCCTCCTACCGT 62.0 25
R: GACATCAATAGCGTAAGTAAAACCCAC 60.5 27
UBQ7 F: GAAGGCATTCCACCTGACCAAC 61.9 22
R: CTTGACCTTCTTCTTCTTGTGCTTG 60.3 25
35
Acknowledgments
We are indebted to Dr. Anming Ding (Tobacco Research Institute, Chinese Academy of Agricultural
Sciences, Qingdao, China) for supplying HMMER matrix of PPR gene family in Arabidopsis (defined
by Prof. Small Ian). We thank Dr. Zhen Su (State Key Laboratory of Plant Physiology and
Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, China) for
helpful advises and discussion. This work was supported by the National Natural Science Foundation
of China (31671741) and National Key R & D Program for Crop Breeding (2016YFD0100203) to J
HUA.
36
References
Akagi H., Nakamura A., Yokozeki-Misono Y., Inagaki A., Takahashi H., Mori K. et al. 2004 Positional
cloning of the rice Rf-1 gene, a restorer of BT-type cytoplasmic male sterility that encodes a
mitochondria-targeting PPR protein. Theor. Appl. Genet. 108, 1449-1457.
Altschul S. F., Gish W., Miller W., Myers E. W. and Lipman D. J. 1990 Basic local alignment search
tool. J. Mol. Biol. 215, 403-410.
Aubourg S., Boudet N., Kreis M. and Lecharny A. 2000 In Arabidopsis thaliana, 1% of the genome
codes for a novel protein family unique to plants. Plant Mol. Biol. 42, 603-613.
Barkan A. and Small I. 2014 Pentatricopeptide repeat proteins in plants. Annu. Rev. Plant Biol. 65,
415-442.
Bentolila S., Alfonso A. A. and Hanson M. R. 2002 A pentatricopeptide repeat-containing gene restores
fertility to cytoplasmic male-sterile plants. Proc. Natl. Acad. Sci. U S A 99, 10887-10892.
Brown G. G., Formanova N., Jin H., Wargachuk R., Dendy C., Patil P. et al. 2003 The radish Rfo
restorer gene of Ogura cytoplasmic male sterility encodes a protein with multiple
pentatricopeptide repeats. Plant J. 35, 262-272.
Carlsson J., Leino M., Sohlberg J., Sundstrom J. F. and Glimelius K. 2008 Mitochondrial regulation of
flower development. Mitochondrion 8, 74-86.
Chen, Z., Feng, K., Grover, C.E., Li, P., Liu, F., Wang, Y., et al. 2016 Chloroplast DNA structural
variation, phylogeny, and age of divergence among diploid cotton species. PLoS ONE 11,
e0157183.
Chen, Z., Grover, C.E., Li, P., Wang, Y., Nie, H., Zhao, Y., et al. 2017a Molecular evolution of the
plastid genome during diversification of the cotton genus, Mol. Phylogenet. Evol.112, 268-278.
Chen, Z., Nie, H., Grover, C.E., Wang, Y., Li, P., Wang, M. et al. 2017b Entire nucleotide sequences
of Gossypium raimondii and G. arboreum mitochondrial genomes revealed A-genome species as
cytoplasmic donor of the allotetraploid species. Plant Biol. 19, 484-493.
Chen Z, Zhao N, Li S, Grover CE, Nie H, Wendel JF, et al. 2017c Plant mitochondrial genome
evolution and cytoplasmic male sterility. Crit Rev Plant Sci. 36, 55–69.
Cui X., Wise R. P. and Schnable P. S. 1996 The rf2 nuclear restorer gene of male-sterile T-cytoplasm
maize. Science 272, 1334-1336.
Cushing D. A., Forsthoefel N. R., Gestaut D. R. and Vernon D. M. 2005 Arabidopsis emb175 and other
ppr knockout mutants reveal essential roles for pentatricopeptide repeat (PPR) proteins in plant
embryogenesis. Planta 221, 424-436.
Desloire S., Gherbi H., Laloui W., Marhadour S., Clouet V., Cattolico L. et al. 2003 Identification of
the fertility restoration locus, Rfo, in radish, as a member of the pentatricopeptide-repeat protein
family. EMBO Rep. 4, 588-594.
Dewey R. E., Timothy D. H. and Levings C. S. 1987 A mitochondrial protein associated with
cytoplasmic male sterility in the T cytoplasm of maize. Proc. Natl. Acad. Sci. U S A 84,
5374-5378.
Feng C. D., Stewart J. M. and Zhang J. F. 2005 STS markers linked to the Rf1 fertility restorer gene of
cotton. Theor. Appl. Genet. 110, 237-243.
37
Fujii S., Bond C. S. and Small I. D. 2011 Selection patterns on restorer-like genes reveal a conflict
between nuclear and mitochondrial genomes throughout angiosperm evolution. Proc. Natl. Acad.
Sci. U S A 108, 1723-1728.
Fujii S. and Toriyama K. 2009 Suppressed expression of retrograde-regulated male sterility restores
pollen fertility in cytoplasmic male sterile rice plants. Proc. Natl. Acad. Sci. U S A 106,
9513-9518.
Gallagher J. P., Grover C. E., Rex K., Moran M. and Wendel J. F. 2017 A new species of cotton from
Wake Atoll, Gossypium stephensii (Malvaceae). Syst. Bot. 42, 115-123.
Galtier N. 2011 The intriguing evolutionary dynamics of plant mitochondrial DNA. BMC Biol. 9, 61.
Geddy R. and Brown G. G. 2007 Genes encoding pentatricopeptide repeat (PPR) proteins are not
conserved in location in plant genomes and may be subject to diversifying selection. BMC
Genomics 8, 130.
Germain A., Hotto A. M., Barkan A. and Stern D. B. 2013 RNA processing and decay in plastids. Wiley
Interdiscip. Rev. RNA 4, 295-316.
Giancola S., Marhadour S., Desloire S., Clouet V., Falentin-Guyomarc'h H., Laloui W. et al. 2003
Characterization of a radish introgression carrying the Ogura fertility restorer gene Rfo in
rapeseed, using the Arabidopsis genome sequence and radish genetic mapping. Theor. Appl.
Genet. 107, 1442-1451.
Gillman J. D., Bentolila S. and Hanson M. R. 2007 The petunia restorer of fertility protein is part of a
large mitochondrial complex that interacts with transcripts of the CMS-associated locus. Plant J.
49, 217-227.
Hashimoto M., Endo T., Peltier G., Tasaka M. and Shikanai T. 2003 A nucleus-encoded factor, CRR2, is
essential for the expression of chloroplast ndhB in Arabidopsis. Plant J. 36, 541-549.
Howell M. D., Fahlgren N., Chapman E. J., Cumbie J. S., Sullivan C. M., Givan S. A. et al. 2007
Genome-wide analysis of the RNA-denpendent RNA polymerase6/DICER-like4 pathway in
Arabidopsis reveals dependency on miRNA- and tasiRNA-directed targeting. Plant Cell 19,
926-942.
Hu J., Wang K., Huang W., Liu G., Gao Y., Wang J. et al. 2012 The rice pentatricopeptide repeat protein
RF5 restores fertility in Hong-Lian cytoplasmic male-sterile lines via a complex with the
glycine-rich protein GRP162. Plant Cell 24, 109-122.
Itabashi E., Iwata N., Fujii S., Kazama T. and Toriyama K. 2011 The fertility restorer gene, Rf2, for
lead rice-type cytoplasmic male sterility of rice encodes a mitochondrial glycine-rich protein.
Plant J. 65, 359-367.
Janska H., Sarria R., Woloszynska M., Arrieta-Montiel M. and Mackenzie S. A. 1998 Stoichiometric
shifts in the common bean mitochondrial genome leading to male sterility and spontaneous
reversion to fertility. Plant Cell 10, 1163-1180.
Jore M. M., Lundgren M., van Duijn E., Bultema J. B., Westra E. R., Waghmare S. P. et al. 2011
Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nat. Struct. Mol. Biol. 18,
529-536.
Katoh K. and Standley D. M. 2013 MAFFT multiple sequence alignment software version 7:
improvements in performance and usability. Mol. Biol. Evol. 30, 772-780.
Kazama T. and Toriyama K. 2003 A pentatricopeptide repeat-containing gene that promotes the
processing of aberrant atp6 RNA of cytoplasmic male-sterile rice. FEBS Lett. 544, 99-02.
38
Klein R. R., Klein P. E., Mullet J. E., Minx P., Rooney W. L. and Schertz K. F. 2005 Fertility restorer
locus Rf1 of sorghum (Sorghum bicolor L.) encodes a pentatricopeptide repeat protein not present
in the colinear region of rice chromosome 12. Theor. Appl. Genet. 111, 994-1012.
Koizuka N., Imai R., Fujimoto H., Hayakawa T., Kimura Y., Kohno-Murase J. et al. 2003 Genetic
characterization of a pentatricopeptide repeat protein gene, orf687, that restores fertility in the
cytoplasmic male-sterile Kosena radish. Plant J. 34, 407-415.
Komori T., Ohta S., Murai N., Takakura Y., Kuraya Y., Suzuki S. et al. 2004 Map-based cloning of a
fertility restorer gene, Rf-1, in rice (Oryza sativa L.). Plant J. 37, 315-325.
Lei B., Li S., Liu G., Chen Z., Su A., Li P. et al. 2013 Evolution of mitochondrial gene content: loss of
genes, tRNAs and introns between Gossypium harknessii and other plants. Plant Syst. Evol. 299,
1889-1897.
Li F., Fan G., Wang K., Sun F., Yuan Y., Song G. et al. 2014 Genome sequence of the cultivated cotton
Gossypium arboreum. Nat. Genet. 46, 567-572.
Li S., Liu G., Chen Z., Wang Y., Li P., Hua J. 2013. Construction and initial analysis of five Fosmid
libraries of mitochondrial genomes of cotton (Gossypium). Chinese Sci. Bull. 58, 4608-4615.
Li P., Cao M., Yang L., Xu A. and Liu H. 2007 Mapping of fertility restorer gene for cotton cytoplasmic
male sterile line Jin A. Acta Bot. Bor-Occid. Sin. 27 1937-1942.
Liu F., Cui X., Horner H. T., Weiner H. and Schnable P. S. 2001 Mitochondrial aldehyde
dehydrogenase activity is required for male fertility in maize. Plant Cell 13, 1063-1078.
Liu L., Guo W., Zhu X. and Zhang T. 2003 Inheritance and fine mapping of fertility restoration for
cytoplasmic male sterility in Gossypium hirsutum L. Theor. Appl. Genet. 106, 461-469.
Luo D., Xu H., Liu Z., Guo J., Li H., Chen L. et al. 2013 A detrimental mitochondrial-nuclear
interaction causes cytoplasmic male sterility in rice. Nat. Genet. 45, 573-577.
Lurin C., Andres C., Aubourg S., Bellaoui M., Bitton F., Bruyere C. et al. 2004 Genome-wide analysis
of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle
biogenesis. Plant Cell 16, 2089-2103.
Mackenzie S. A. and Chase C. D. 1990 Fertility restoration is associated with loss of a portion of the
mitochondrial genome in cytoplasmic male-sterile common bean. Plant Cell 2, 905-912.
Matsuhira H., Kagami H., Kurata M., Kitazaki K., Matsunaga M., Hamaguchi Y. et al. 2012 Unusual
and typical features of a novel restorer-of-fertility gene of sugar beet (Beta vulgaris L.). Genetics
192, 1347-1358.
Meierhoff K., Felder S., Nakamura T., Bechtold N. and Schuster G. 2003 HCF152, an Arabidopsis
RNA binding pentatricopeptide repeat protein involved in the processing of chloroplast
psbB-psbT-psbH-petB-petD RNAs. Plant Cell 15, 1480-1495.
Melonek J., Stone J. D. and Small I. 2016 Evolutionary plasticity of restorer-of-fertility-like proteins in
rice. Sci. Rep. UK 6, 35152.
Meyer V. G. 1975 Male sterility from Gossypium harknessii. J. Heredity 66, 23-27.
Mistry J., Finn R. D., Eddy S. R. Bateman A. and Punta M. 2013 Challenges in homology search:
HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 41, e121.
Nakamura T., Meierhoff K., Westhoff P. and Schuster G. 2003 RNA-binding properties of HCF152, an
Arabidopsis PPR protein involved in the processing of chloroplast RNA. Eur. J. Biochem. 270,
4070-4081.
O'Toole N., Hattori M., Andres C., Iida K., Lurin C., Schmitz-Linneweber C. et al. 2008 On the
expansion of the pentatricopeptide repeat gene family in plants. Mol. Biol. Evol. 25, 1120-1128.
39
Schnable P. S. and Wise R. P. 1998 The molecular basis of cytoplasmic male sterility and fertility
restoration. Trends Plant Sci. 3, 175-180.
Small I. D. and Peeters N. 2000 The PPR motif - a TPR-related motif prevalent in plant organellar
proteins. Trends Biochem. Sci. 25, 46-47.
Suzuki H., Yu J., Ness S. A., O'Connell M. A. and Zhang J. 2013 RNA editing events in mitochondrial
genes by ultra-deep sequencing methods: a comparison of cytoplasmic male sterile, fertile and
restored genotypes in cotton. Mol. Genet. Genomics 288, 445-457.
Sykes T., Yates S., Nagy I., Asp T., Small I. and Studer B. 2017 In silico identification of candidate
genes for fertility restoration in cytoplasmic male sterile perennial ryegrass (Lolium perenne L.).
Genome Biol. Evol. 9, 351-362.
Tamura K., Peterson D., Peterson N., Stecher G., Nei M. and Kumar S. 2011 MEGA5: molecular
evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum
parsimony methods. Mol. Biol. Evol. 28, 2731-2739.
Wang F., Yue B., Hu J. G., Stewart J. M. and Zhang J. F. 2009 A target region amplified polymorphism
marker for fertility restorer gene Rf1 and chromosomal localization of Rf1 and Rf2 in cotton. Crop
Sci. 49, 1602-1608.
Wang Z., Zou Y., Li X., Zhang Q., Chen L., Wu H. et al. 2006 Cytoplasmic male sterility of rice with
boro II cytoplasm is caused by a cytotoxic peptide and is restored by two related PPR motif genes
via distinct modes of mRNA silencing. Plant Cell 18, 676-687.
Wang Z. W., De Wang C., Gao L., Mei S. Y., Zhou Y., Xiang C. P. et al. 2013 Heterozygous alleles
restore male fertility to cytoplasmic male-sterile radish (Raphanus sativus L.): a case of
overdominance. J. Exp. Bot. 64, 2041-2048.
Wendel J. F. and Grover C. E. 2015 Taxonomy and evolution of the cotton genus. In: Fang D and Percy
R, editors. Cotton. American Society of Agronomy, Inc., Crop Science Society of America, Inc.,
and Soil Science Society of America, Inc., Madison, WI, pp. 25-44
Wu J. Y., Cao X. X., Guo L. P., Qi T. X., Wang H. L., Tang H. N. et al. 2014 Development of a
candidate gene marker for Rf1 based on a PPR gene in cytoplasmic male sterile CMS-D2 upland
cotton. Mol. Breeding 34, 231-240.
Xia X. and Xie Z. 2001 DAMBE: Software package for data analysis in molecular biology and
evolution. J. Heredity 92, 371-373.
Yang Z. and Nielsen R. 2000 Estimating synonymous and nonsynonymous substitution rates under
realistic evolutionary models. Mol. Biol. Evol. 17, 32-43.
Yin J., Guo W., Yang L., Liu L. and Zhang T. 2006 Physical mapping of the Rf1 fertility-restoring gene
to a 100 kb region in cotton. Theor. Appl. Genet. 112, 1318-1325.
Zabala G., Gabay-Laughnan S. and Laughnan J. R. 1997 The nuclear gene Rf3 affects the expression of
the mitochondrial chimeric sequence R implicated in S-type male sterility in maize. Genetics 147,
847-860.
Zhang J. F. and Stewart J. M. 2004 Identification of molecular markers linked to the fertility restorer
genes for CMS-D8 in cotton. Crop Sci. 44, 1209-1217.
Zhang T., Hu Y., Jiang W., Fang L., Guan X. and Chen J. 2015 Sequencing of allotetraploid cotton
(Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol.
33, 531-537.
40
Zhang X., Wang L., Xu X., Cai C. and Guo W. 2014 Genome-wide identification of mitogen-activated
protein kinase gene family in Gossypium raimondii and the function of their corresponding
orthologs in tetraploid cultivated cotton. BMC Plant Biol. 14, 345.
Zhao L., Yuanda L., Caiping C., Xiangchao T., Xiangdong C., Wei Z. et al. 2012 Toward allotetraploid
cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA
sequence information. BMC Genomics 13, 539.
Table captions
Table 1 Identification of PPR gene family in D5 and A2 genomes of Gossypiuma
Chromosomeb
G. raimondii G. arboreum
No. of PPR locus No. of PPR motif No. of PPR locus No. of PPR motif
chr01 35 210 32 149
chr02 30 229 18 218
chr03 20 195 26 154
chr04 29 241 33 172
chr05 40 306 27 157
chr06 37 299 49 225
chr07 45 244 25 148
chr08 48 271 33 218
chr09 75 799 34 191
chr10 32 235 31 155
chr11 32 349 42 231
chr12 26 221 39 205
chr13 33 244 30 144
Total No.c 482 3843 433d 2367
Note:
a The number of PPR genes in D5 and A2 genomes was “clean” data suffered from two filter processing.
b The chromosome and the number labeled with a double underline represented the location of the maximum number of PPR locus and the
maximum. While the chromosome and the number labeled with a single underline indicated the location of the minimum number of
PPR locus and the minimum.
c The total numbers of PPR genes in D5 and A2 genomes were marked only in bold.
d There were 14 PPR loci identified on large scaffolds.
41
Table 2 Expression analysis of PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum
(AD1) genomes
Gene RPKM value-Fold
2074A 2074B AE1
Gorai.010G0536(1) 1.00 1.50 5.71
Gorai.010G0536(2) 1.00 1.63 6.57
Gorai.007G1431 1.00 1.20 21.93
Cotton_A_26557 1.00 1.32 20.84
Gorai.006G2471 1.00 0.16 1.27
Cotton_A_08373 1.00 0.41 12.59
GhPPR3 1.00 0.00 5.22
GhK14 1.00 0.05 4.93
Gorai.005G0470 1.00 0.63 3.46