Genome-wide identification of PPR gene family and prediction … · sub-genome (chromosome D05)...

41
1 Research Article Genome-wide identification of PPR gene family and prediction analysis on restorer gene in Gossypium NAN ZHAO 1 , YUMEI WANG 1 , and JINPING HUA Laboratory of Cotton Genetics, Genomics and Breeding /Key Laboratory of Crop Heterosis and Utilization of Ministry of Education /Beijing Key Laboratory of Crop Genetic Improvement, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China Research Institute of Cash Crops, Hubei Academy of Agricultural Sciences, Wuhan 430064, Hubei, China 1 These authors contributed equally to this work. Email: Nan Zhao: [email protected] Yumei Wang: [email protected] Jinping Hua: [email protected] For correspondence E-mail: [email protected] Running title PPR gene evolution in Gossypium species. Keywords. Gossypium; PPR gene family; phylogenetic analysis; cytoplasmic male sterility; restorer gene.

Transcript of Genome-wide identification of PPR gene family and prediction … · sub-genome (chromosome D05)...

1

Research Article

Genome-wide identification of PPR gene family and prediction analysis on

restorer gene in Gossypium

NAN ZHAO 1, YUMEI WANG 1 †, and JINPING HUA ∗

Laboratory of Cotton Genetics, Genomics and Breeding /Key Laboratory of Crop Heterosis and

Utilization of Ministry of Education /Beijing Key Laboratory of Crop Genetic Improvement, College of

Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China

† Research Institute of Cash Crops, Hubei Academy of Agricultural Sciences, Wuhan 430064, Hubei,

China

1 These authors contributed equally to this work.

Email:

Nan Zhao: [email protected]

Yumei Wang: [email protected]

Jinping Hua: [email protected]

∗For correspondence E-mail: [email protected]

Running title

PPR gene evolution in Gossypium species.

Keywords. Gossypium; PPR gene family; phylogenetic analysis; cytoplasmic male sterility;

restorer gene.

2

Abstract

PPR (pentatricopeptide repeat) gene family plays an essential role on the regulation of plant

growth and organelle gene expression. Some PPR genes are related to fertility restoration in

plant, but there is no detailed information in Gossypium. In present study, we identified 482

and 433 PPR homologs in G. raimondii (D5) and G. arboreum (A2) genomes. Most PPR

homologs showed an even distribution on the whole chromosomes. Given an evolutionary

analysis to PPR genes from G. raimondii (D5), G. arboreum (A2) and G. hirsutum genomes, 8

PPR genes were clustered together with restoring genes of other species. Most cotton PPR

genes were qualified with no intron, high proportion of α-helix and classical tertiary structure

of PPR protein. Based on bioinformatics analyses, 8 PPR genes were targeted in

mitochondrion, encoding typical P sub-family protein with protein binding activity and

organelle RNA metabolism in function. Further verified by RNA-seq and qRT-PCR analyses,

2 PPR candidate genes, Gorai.005G0470 (D5) and Cotton_A_08373 (A2), were up regulated

in fertile line than sterile line. These results reveal new insights into PPR gene evolution in

Gossypium.

3

Introduction

The cotton genus, Gossypium, is home to the most important fiber crop plants in the world,

with four of the ~53 species cultivated, two diploid and two allotetraploid. The genus originated

approximately 5-10 million years ago (Mya), subsequently diversifying into ~ 46 diploid species

(allocated into 8 monophyletic genome groups, designated A-G and K) and 7 allotetraploid species

(Wendel and Grover 2015; Chen et al. 2016; Chen et al. 2017c; Gallagher et al. 2017).

Allopolyploid Gossypium is the result of the transoceanic dispersal of an A-genome species

(resembling G. arboreum, A2), which subsequently hybridized with a native D-genome species

(resembling G. raimondii, D5) in the New World and experienced chromosome doubling (Wendel

1989; Chen et al. 2017a; Chen et al. 2017b).

Heterosis is widely exploited in crop plants to increase yield potential of production and

improve quality, including using three-lines (sterile line, maintainer line and restoring line) to

develop hybrid cotton (Bentolila et al. 2002). It is well known that fertility of plants is

co-determined by mitochondrial and nuclear genes (Dewey et al. 1987; Schnable and Wise

1998; Carlsson et al. 2008; Galtier 2011; Suzuki et al. 2013). Most nuclear restoring genes

were reported as homologs of PPR (Pentatricopeptide repeat) gene family, such as Rf-PPR592

in petunia (Bentolila et al. 2002; Koizuka et al. 2003; Gillman et al. 2007), Rfo in

CMS-Ogura radish (Brown et al. 2003; Desloire et al. 2003; Koizuka et al. 2003) and another

tightly linked restoring gene RsRf (Wang et al. 2013); similarly, PPR-like Rf genes were also

identified, such as Rf1 in CMS-BT rice (Kazama and Toriyama 2003; Akagi et al. 2004;

Komori et al. 2004; Wang et al. 2006), Rf5 in the CMS-HL rice (Hu et al. 2012), Rf3 in

CMS-S maize (Zabala et al. 1997), and PPR13 in A1 sorghum (candidate gene of Rf1) (Klein

et al. 2005). In addition, there exist restoring genes encoding non-PPR proteins, such as Rf2 in

CMS-T maize (Cui et al. 1996; Liu et al. 2001), Rf2 in CMS-LD rice (Itabashi et al. 2011),

Rf17 in CMS-CW rice (Fujii and Toriyama 2009), Rf1(bvORF20) in sugar beet (Matsuhira et

al. 2012).

PPR genes consist of a series of similar contiguous-arrangement PPR motifs with 35

degenerate amino acids, some of which are very conservative (Small and Peeters 2000), and

evolved from earlier TPR (tetratricopeptide repeat) ancestors (Barkan and Small 2014). PPR

genes are widespread in plants (Lurin et al. 2004; Wang et al. 2006), and PPR gene families

have had a significant influence on the plant organellar genome evolution, especially

organelle-specific RNA metabolism (Germain et al. 2013). PPR gene families are divided into

two subfamilies, the PLS and P subfamilies. The PLS subfamily itself is subdivided into four

groups: PLS group, group E, E+ group and DYW group (Lurin et al. 2004). Most PPR genes

contain no intron (Lurin et al. 2004) and encode organelle-targeting peptides in N-terminus

4

(Lurin et al. 2004). PPR gene functions are mainly focused on four aspects: 1) to regulate the

expression of chloroplast and mitochondrial genes, such as HCF152 in A. thaliana (Meierhoff

et al. 2003; Nakamura et al. 2003); 2) to participate in plant-specific RNA metabolism

(mainly PPR genes of PLS subfamily), such as CRR4 in A. thaliana (Hashimoto et al. 2003;

Howell et al. 2007); 3) to regulate the embryonic development of higher plants, such as CRR4

in A. thaliana (Cushing et al. 2005); and 4) to affect the fertility restoration of cytoplasmic

male sterility in plants, such as Rf1 in Oryza sativa (Wang et al. 2006). Compared with other

kinds of PPR genes, these PPR genes that serve as restoring genes usually cluster together

with some homologous sequences (also known as Rf-like or RFL genes), which leads to a

unique way of dynamic evolution (Geddy and Brown 2007; O'Toole et al. 2008; Fujii et al.

2011).

The CMS line of cotton with G. harknessii sterile cytoplasm (CMS-D2-2) is sporophyte

sterile, which is restored by a single dominant gene Rf1 (Zhang and Stewart 2004). Rf1 was

located on chromosome 19 (Li et al. 2007), namely, LGD08 linkage group in Dt sub-genome

(chromosome D05) (Wang et al. 2009). The latest association between the chromosomes of

allotetraploid cotton and that of diploid G. raimondii pointed out that chromosome 19

(chromosome D05) of allotetraploid cotton corresponding to chromosome 9 of G. raimondii

(Zhao et al. 2012; Zhang et al. 2014). Then, a Rf1 candidate gene, Cotton_D_gene_10013437,

showed 9nt insertion in 3’ UTR and a SNP in restoring line compared to non-restoring lines

(Wu et al. 2014). Up to now, the restorer genes for cytoplasmic male sterility in plants are

mainly obtained through map-based cloning, and some progresses have been made in

screening molecular markers associated with cotton restoring genes and mapping. With the

high-throughput biological data springing up, it may turn out to be a feasible method to

explore the fertility restorer genes of cotton cytoplasmic male sterility (CMS) by whole

genome and transcriptome sequencing combined with bioinformatics analysis. Taken the

close relationship to Rf genes in other species, PPR gene families were identified in G.

arboreum (A2) and G. raimondii (D5) genomes. From an evolutionary perspective, we further

obtain some candidate cotton PPR genes that cluster with Rf-PPR genes in other species. In

addition, we analyzed the evolutionary pressure, functional annotation, subfamily

classification and subcellular localization of these PPR genes. Last, the differential expression

of PPR candidate genes was analyzed in the sterile and fertile cotton materials using RNA-seq

transcriptome data and qRT-PCR validation. We expect the results will lay the foundation for

further researches on the molecular mechanisms of interaction between restorer gene and

CMS-D2 cytoplasm in cotton.

Materials and methods

5

Plant materials

This experiment used G. harknessii CMS line 2074A, G. hirsutum CMS line 2074S, their

maintainer 2074B, and two different fertile F1 hybrids derived from both CMS lines with

restorer line E5903 as plant materials (Li et al. 2013).

2074A, a CMS line with G. harknessii (D2-2) male-sterile cytoplasm, was bred by

backcrossing the original sterile line DES-HAMS277 (Meyer 1975) more than 20 generations

with upland cotton cultivar ‘2074B’ (Lei et al. 2013).

2074S, a CMS line with G. hirsutum (AD1) male-sterile cytoplasm, was derived from

hybridizing line X658 with G. hirsutum L. (AD1) and backcrossing for 17 generations.

2074B, the maintainer of 2074A and 2074S with G. hirsutum fertile cytoplasm, a cultivar

of upland cotton ‘Sumian 20’.

E5903, a nuclear restorer line with normal nuclear and normal fertile G. harknessii

cytoplasm, originated from DES-HAF277 (Meyer 1975) by inbreeding for more than 30

generations.

F1, fertile F1 generations materials were generated from hybridizing CMS line 2074A and

2074S with restorer line E5903 (FA, 2074A × E5903; FS, 2074S × E5903).

Identification of PPR gene family and chromosome localization analysis

The genome sequences, CDS sequences and amino acid sequences of G. raimondii, G.

arboreum and G. hirsutum were downloaded from Phytozome (http://www.phytozome.net/)

and Cotton Genome Project (CGP) (http://cgp.genomics.org.cn/), respectively. To identify

members of the PPR protein family in the genome assembly of Gossypium, all available PPR

domain sequences from the Pfam database (http://pfam.xfam.org) were collected and used for

the development of a Hidden Markov Model (HMM) profile matrix using the hmmbuild

program of the HMMER package (v3.1b1, http://hmmer.org). This HMM profile matrix was

used to identify members of the PPR family in cotton amino acid sequences obtained from

these high-quality genomic drafts of the G. raimondii, G. arboreum and G. hirsutum genome

sequences (Paterson et al., 2012; Li et al., 2014; Zhang et al., 2015). Sequences containing 10

or more P-class PPR motifs were retained for further analyses, as a previous study has shown

that RFL genes are primarily comprised of tandem arrays of 15 to 20 PPR motifs (Fujii et al.

2011). The location of PPR genes on chromosomes were determined by local BLAST

(Altschul et al. 1990).

Phylogenetic analysis

6

The amino acid sequences of 6 Rf-PPR genes from 5 plant species (rapeseed

(PPR_B_L1), radish (Rfo_PPR B), Arabidopsis (RPF1), petunia (Rf_PPR592), rice (Rf1a and

Rf1b)) were downloaded from NCBI database. We separately performed 26

single-chromosome phylogenetic analyses of PPR protein genes in G. raimondii (D5) and G.

arboreum (A2) genomes with 6 Rf-PPR genes mentioned above using amino acid sequences.

The resulting PPR genes on each chromosome of D5 and A2 that clustered with those 6

confirmed Rf-PPR genes were subsequently used to conduct a comprehensive analysis with 6

Rf-PPR genes as well as 15 G. hirsutum (AD1) PPR protein sequences retrieved from the

NCBI database. As an important supplement, PPR genes predicted on chromosome D05 in G.

hirsutum (AD1) genome (data unpublished) were also used for a single-chromosome

phylogenetic analysis. First, amino acid sequences were aligned by MAFFT (Katoh and

Standley 2013), and setting the default parameters. Then, the phylogenetic trees were built

based on GTR + G +R model by Maximum Likelihood method using MEGA 5.05 (Tamura

et al. 2011), setting the bootstrap value to 1000 repeats.

Selective constraints analysis

Homologous PPR genes pairs in G. raimondii (D5) and G. arboreum (A2) genomes were

acquired using BLAST alignment with the highest identity. The alignment fasta files were

converted to PAML files using software DAMBE (Xia and Xie 2001). Non-synonymous

substitution rate (Jore et al. 2011), synonymous substitution rate (dS) and the value of dN/dS

were calculated using yn00 program in PAML (Yang and Nielsen 2000). GO annotations were

carried out by agriGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php). Taking cotton genome

locus (phytozome) as reference, we conducted function annotation to corresponding PPR

genes in G. raimondii (D5) genome by Singular Enrichment Analysis (SEA) and adopted

hypergeometric statistical test method and Yekutieli (FDR under dependency) multi-test

adjustment method. Significant level was set to 0.05.

GO annotation and subcellular localization analysis

GO (Gene Ontology) annotation of PPR genes were finished by blast2go. First, the CDS

sequences of PPR candidate genes were aligned with nr database, and then annotated. Cut-off

of E value was set to 1e-6. The subcellular localization were predicted using TargetP 1.1

Server (http://www.cbs.dtu.dk/services/TargetP/), Predotar

(https://urgi.versailles.inra.fr/predotar/predotar.html) and ProtComp v. 9.0

(http://linux1.softberry.com/berry.phtml).

Subfamily analysis

7

PPR domains analysis of PPR genes were developed based on HMMER matrix (defined on 7

conserved domains of PPR gene family in Arabidopsis: P, L, L2, S, E, E+ and DYW) using

hidden markov model in software HMMER3.0 (Mistry et al. 2013). Subsequently, each PPR

sequence was analyzed artificially for its arrangement of PPR motifs. E value was set to less

than e-10 in hmmsearch.

RNA-seq and qRT-PCR analyses

RNA-seq data of young buds in CMS line 2074A, maintainer line 2074B and fertile material

F1 (2074A × E5903 (restoring line)) (unpublished) were used to analyze the expression of

PPR genes. The expression were estimated using RPKM (reads per kilobase of exon model

per million mapped reads) values. The diagram was drawn through gplots package in R.

Total RNA of young buds were extracted using improved CTAB-SDS method in 6 cotton

species: the CMS lines 2074A and 2074S, their maintainer line 2074B, restorer line E5903,

fertile hybrid material F1 (FA, 2074A × E5903; FS, 2074S × E5903). Genomic DNA digestion

and reverse transcription were carried on using PrimeScript ™ RT reagent Kit with gDNA

Eraser (Perfect Real Time) RR047A (TaKaRa). The primers used for qRT-PCR were designed

by Primer Premier 5, and synthesized by Sangon Biotech (Additional file 5: Table S5). Real

time PCR experiments were finished using SYBR® Premix Ex TaqTM II (Tli RNaseH Plus)

RR820A kit (TaKaRa) by Applied Biosystems 7500 Real-Time PCR System. The procedure

contained 3 stages: stage 1, 95℃, 30 sec, 1 repeats; stage 2: 95℃, 5 sec, 60℃, 35 sec, 40

repeats; stage 3: 95℃, 15 sec, 60℃, 1 min, 95℃, 35 sec, 1 repeats. Taking cotton

housekeeping genes UBQ7 as internal control, we analyzed the relative expression of 8 PPR

candidate genes using 2-ΔΔCt method. Each sample is repeated for 3 times.

Results and Discussion

Identification and chromosome distribution of PPR gene family

Totally 482 and 433 PPR genes from G. raimondii (D5) and G. arboreum (A2) were identified

by genome-wide analyses (table 1). The distribution of PPR genes varied among 13

chromosomes in G. raimondii (D5) and G. arboreum (A2) genomes, respectively. The

maximum numbers of PPR genes were 75 and 49 on a single chromosome of G. raimondii (D5)

and G. arboreum (A2), and located on chromosome 9 and chromosome 6, respectively (figure

1). While the chromosome 3 and chromosome 2 contained the least PPR genes in G.

raimondii (D5) and G. arboreum (A2) genomes, respectively, which were 20 and 18 (figure 1).

PPR genes in two cotton species were evenly distributed, which had been observed in

8

Arabidopsis 5 chromosomes (Aubourg et al. 2000; Lurin et al. 2004). However, some PPR

gene clustered on some chromosomes, such as chromosome 4, 5, 6 and 10 in G. raimondii (D5)

genome, as well as chromosome 4 and 5 in G. arboreum (A2) genome. These clustered PPR

genes are typically involved in the Rf loci as had been observed in other plants (Bentolila et

al. 2002; Brown et al. 2003; Giancola et al. 2003; Komori et al. 2004; Wang et al. 2006).

Phylogenetic analyses

Restorer of fertility-like (RFL) PPR genes have been reported in several plant species, such as

Rf-PPR592 in petunia, Rfo in radish, RPF1 in Arabidopsis, Rf1a and Rf1b in rice (Bentolila et

al. 2002; Brown et al. 2003; Giancola et al. 2003; Komori et al. 2004; Wang et al. 2006).

Taking those Rfs as outgroups, we performed 26 single-chromosome phylogenetic analyses of

PPR genes in G. raimondii (D5) and G. arboreum (A2) genomes separately (Additional file 1:

figure S1). In total, we acquired 36 and 19 candidate restorer of fertility-like (RFL)PPR genes

clustering together with other 6 Rfs in G. raimondii (D5) and G. arboreum (A2) genomes,

respectively (Additional file 2: table S1 and table S2).

Furthermore, a comprehensive phylogenetic analysis consisted of 36 PPR candidate

genes from G. raimondii (D5), 19 PPR candidate genes from G. arboreum (A2) and 15 PPR

genes from G. hirsutum (AD1) (data unpublished), with 6 Rfs genes as outgroups (figure 2).

There were 8 PPR genes derived from G. raimondii (D5), G. arboreum (A2) and G. hirsutum

(AD1) genomes clustering into one clade. Thereinto, two homologous pairs, Gorai.005G0470

(D5) and Cotton_A_08373 (A2), Gorai.007G1431 (D5) and Cotton_A_26557 (A2), along with

Gorai.006G2471 (D5) had a close evolutionary relationships with Rf_PPR592 in petunia.

Gorai.010G0536 (D5) and GhK14 (AD1) were sister to PPRB_L1 of rapeseed, Rfo_PPRB of

radish and RPF1 of Arabidopsis. GhPPR3 (AD1) clustered with Rf1a and Rf1b in rice (figure

2). These 8 PPR candidate genes might be associated with the fertility restoration in cotton, as

studies had shown that Rfs and highly homologous RFL genes in plant species always formed

a single evolutionary clade (Fujii et al. 2011; Melonek et al. 2016; Sykes et al. 2017).

Fertility of cytoplasmic male sterility (CMS) line in G. harknessii (D2-2) was restored by

Rf1 (Feng et al. 2005), which was mapped on chromosome D05 (Liu et al. 2003; Li et al.

2007; Wang et al. 2009). Molecular markers tightly linked to Rf1 include UBC679-700 and

BNL4047-170 (Yin et al. 2006), CIR179-200 and CM042-150 (Li et al. 2007), Y1107-350 and

TRAP425 (Wang et al. 2009), while no any alignment with these markers to G. hirsutum

anchored chromosomes (Zhang et al. 2015). In addition, we predicted 55 PPR genes on

chromosome D05 in G. hirsutum (AD1) genome (data unpublished). After phylogenetic

analysis of 55 PPR genes, we acquired 3 PPR candidate genes that clustered to 6 restorer

genes (Additional file 1: figure S2). These results indicated that these PPR candidate genes

9

might have a more close relationship with 6 Rfs in 5 other plant species.

Selective constraints on PPR genes

G. raimondii and G. arboreum diverged from a common ancestor about 10 million years ago,

and were almost similar in gene number and sequence (Li et al. 2014). We found 377 pairs of

homologous PPR genes (Additional file 3: table S3) between two genomes, that is, 78% PPR

genes in G. raimondii (D5) genome were homologous to 87% of PPR genes in G. arboreum

(A2) genome, suggesting that most PPR genes in two genomes were co-evolved.

In order to study the evolution pattern of PPR gene families in cotton, we calculated the

nucleotide nonsynonymous substitution rate (dN), nucleotide synonymous substitution rate

(dS) and the dN/dS value (Jore et al. 2011). As we observed, most PPR genes were under

purifying selection (figure 3A, Additional file 3: table S3). Interestingly, average dN and dS

values of RFLs (36 D5-RFLs and 19 A2-RFLs) genes were higher than other PPR genes, as

also reported in Fujii et al. (2011). The D5-RFLs evolved faster than other PPR genes, on the

contrary, A2-RFLs had a lower evolutionary rate than other PPR genes (figure 3C, Additional

file 3: table S3). It’s likely that the restorer gene might derived from D sub-genome (Wu et al.

2014), especially for those cotton lines with D genome sterile cytoplasm, such as 2074A

containing G. hirsutum nuclear and G. harknessii sterile cytoplasm in our study, resulting in a

specific nuclear-cytoplasmic interaction combination. Maybe it is a much more complex

question than the difference in polyploid or diploid cotton, because most cotton CMS lines

were created by hybridizing between different species.

In addition, in order to clarify the relationship between the evolution pattern of PPR

genes and biological functions involved in cotton, we conducted GO annotation to A2-D5

homologous PPR genes (Additional file 4: figure S3), and categorized by dN, dS and dN/dS

value. We detected that PPR genes related to localization contain the lowest dN/dS value

(figure 3B, Additional file 3: table S3), which suggested that this kind of PPR genes suffered

evolutionary constraint during the divergence process of G. raimondii (D5) and G. arboreum

(A2). Most PPR genes were targeted in mitochondria and a few in chloroplasts, which

correspond to the organelles-targeting peptide sequence in the N end of most of PPR genes

(Lurin et al. 2004).

Subcellular localization and Subfamily analysis of PPR candidate genes

For the 36 D5-RFLs, 19 A2-RFLs, 15 AD1 PPR genes in cotton and 6 Rf genes in other species,

most of them were targeted in mitochondria, a few in chloroplasts.These results were verified

by subcellular localization from three softwares (TargetP, Predotar and ProtComp). That is,

10

72% of PPR genes were in mitochondria, 10% in chloroplasts, and 16% overlapped (figure 4),

as observed that most Rf-PPR genes were targeted to mitochondria (Bentolila et al. 2002;

Komori et al. 2004; Lurin et al. 2004).

PPR gene family was divided into PLS subfamily and P subfamily, while PLS subfamily

was further subdivided into four groups: PLS group, E group, E+ group and DYW group

(Lurin et al. 2004). In our research, we analyzed PPR motif arrangement of 36 D5-RFLs, 19

A2-RFLs, 15 G. hirsutum (AD1) PPR genes in cotton and 6 Rf genes in other species using

HMMER matrix (defined by 7 conservative domains: P, L, L2, S, E, E+ and DYW) of PPR

gene family in Arabidopsis thaliana. 6 Rf -PPR genes belonged to the P subfamily (Bentolila

et al. 2002), 36 D5-RFLs and 19 A2-RFLs genes were also attached to P subfamily. However,

15 G. hirsutum (AD1) PPR genes covered all kinds of PPR gene family groups, in which a

variety of classical PPR domains were lined up in a particular order (Lurin et al. 2004)

(Additional file 5: table S4).

RNA-seq and qRT-PCR analyses of PPR candidate gene expressions

In order to verify whether these PPR candidate genes are associated with fertility restoration

in cotton, we performed expression analysis of 36 D5-RFLs, 19 A2-RFLs, 15 AD1 PPR genes

based on RNA-seq data of young buds in CMS line 2074A, maintainer line 2074B and fertile

material FA (unpublished). Compared with the maintainer line 2074B containing normal

fertile cytoplasm from G. hirsutum, the CMS line 2074A and the fertile material FA have the

same male sterile cytoplasm from G. harknessii. However, when hybridizing with the restorer

line E5903 that has normal fertile nuclear and cytoplasm from G. harknessii, the sterile line

2074A produced the fertile FA due to the recombination of a dominant gene Rf with original

recessive non-functional allele rf. All three cotton lines almost have the isogenic nuclear

genomes comprised of A sub-genome and D sub-genome, i.e. they may have different alleles

and/or differential expression of the same restorer gene. In our study, we found that most of

these PPR candidate genes were highly expressed in FA, while lowly expressed in maintainer

line and sterile line (figure 5). Furthermore, 8 of these PPR candidate genes were up-regulated

in FA than in sterile line, which confirmed that these candidate genes are likely related to

fertility restoration in cotton (table 2). Some restorer genes could reduce the abundance of

CMS-related transcripts at transcriptional or post-transcriptional levels, such as Rf-PPR592 in

CMS-RM petunia (Bentolila et al. 2002). In addition, there are also some restorer genes that

function at the genetic or protein levels, such as Fr in CMS-Sprite bean (Mackenzie and

Chase 1990; Janska et al. 1998) and Rf3 in CMS-WA rice (Luo et al. 2013), thus further

experiments are still needed to reveal the molecular mechanism of fertility restoration.

Furthermore, to validate the RNA-seq expression data by experiments, we then

11

carried on qRT-PCR to analyze the differential expression of PPR candidate genes in CMS

lines 2074A and 2074S, their maintainer line 2074B, restorer line E5903 and fertile hybrid

material F1s (FA, 2074A×E5903; FS, 2074S×E5903). Taking cotton housekeeping genes

UBQ7 as internal control, we analyzed the relative expression of 8 PPR candidate genes in

young buds of 6 cottons through real-time fluorescent quantitative PCR technology. As a

result, we found that the expression of two PPR candidate genes, Gorai.005g0470 (D5) and

Cotton_A_08373 (A2), were higher in FA than in sterile line 2074A, while were similar in

expression pattern in 6 cottons (figure 6). At the same time, the up-regulated times of these

two genes in FA than in sterile lines 2074A were 3.45 and 12.59 by RNA-seq, respectively. In

addition, these two PPR genes share high homology, which indicates that their common

ancestor gene appeared before the divergence of D5 and A2 genomes. During the process of

subsequent evolution, they were under purifying selection (Additional file 3: table S3).

Through the phylogenetic analyses, we found that they had a close evolution relationship to

the restorer gene Rf_PPR592 in petunia (Bentolila et al. 2002). In this study, we turned the

progeny of the sterile line 2074A into the fertile FA by the possible Rf gene from D2 nuclear

genome. Therefore, Gorai.005g0470 derived from D5 is more likely to be the candidate Rf

gene of G. harknessii CMS lines 2074A than Cotton_A_08373 in A2. We hope that our results

might provide some helps for studying the restorer genes in cotton.

Conclusion

Totally 482 and 433 PPR genes in two diploid cotton species, G. raimondii (D5) and G.

arboreum (A2) were identified in this study. They were evenly distributed over chromosomes

with few clustered. Phylogenetic analyses produced 36 D5-RFLs and 19 A2-RFLs, thereinto,

D5-RFLs evolved faster than other PPR genes. These RFLs accompanied by 15 AD1-PPR

genes were further brought into a comprehensive phylogenetic analysis, which resulted in 8

cotton PPR candidate genes clustering together with 6 Rf genes in other plant species. 2 of

PPR candidate genes, Gorai.005g0470 (D5) and Cotton_A_08373 (A2) were confirmed to be

up-regulated in fertile lines than in sterile line in cotton by RNA-seq and qRT-PCR analyses.

Our study provided preliminary insights into PPR genes evolution and the RFL genes in

cotton.

12

Figure legends

Figure 1. Distribution of PPR genes number over chromosomes in G. raimondii (D5) and G.

arboreum (A2) genomes. The number of PPR genes on 13 chromosomes in G. raimondii (D5)

was denoted in the sign of ”●”, while that in G. arboreum (A2) was marked in the sign

of ”+”. Except for PPR genes on 13 chromosomes, there were also few PPR genes whose

chromosome location had not been identified, namely, “others”.

Figure 2. Comprehensive phylogenetic analyses of PPR genes from G. raimondii, G.

arboreum and G. hirsutum L. by Maximum Likelihood method. According to the species, the

genes were illustrated in different shapes, box: G. raimondii, dot: G. arboreum, diamond: G.

hirsutum, Outgroups are six restorer genes from five different species, Petunia x hybrid,

Oryza sativa ssp. indica, B. napus, R. sativus and A. thaliana. They were marked by triangle

and the corresponding branches are in bold. These genes keeping a close evolution

13

relationship with other restorer genes are marked in solid shapes.

Figure 3. Nucleotide substitution rates of homologous PPR genes in G. raimondii (D5) and G.

arboreum (A2) genomes. (a) Density distribution of dN/dS values of PPR homologous genes

between G. raimondii (D5) and G. arboreum (A2) genomes. (b) Average nucleotide

substitution rates of RFLs and other PPR genes in G. raimondii (D5) and G. arboreum (A2)

genomes. (c) Box plot for the distribution of dN/dS values of D5-A2 PPR homologies on

secondary level GO terms.

Figure 4. Sub-cellular localization of PPR genes in G. raimondii (D5), G. arboreum (A2) and

G. hirsutum (AD1) genomes. TP, PD and PC represented three softwares, TargetP, Predotar

and ProtComp, separately. The dark blue denoted mitochondria, the light blue chloroplasts,

the white unsure.

14

Figure 5. Expression analysis of PPR candidate genes in G. raimondii (D5), G. arboreum (A2)

and G. hirsutum (AD1) genomes. Based on RNA-seq data of sterile line 2074A, maintainer

line 2074B and fertile material AE1 (F1 [2074A × E5903]), the expression of PPR candidate

genes was calculated by the method of RPKM. The gene expression was denoted by different

colors, green represented relatively down-regulated, and red meant relatively up-regulated.

Two PPR candidate genes in G. arboreum (A2) genome and four PPR candidate genes in G.

raimondii (D5) genome were marked by red arrows on the right and were relatively

up-regulated in AE1 [PPR-21 (Gorai.010G053600.1) and PPR-22 (Gorai.010G053600.2)

were two different transcripts of the same gene (Gorai.010G0536), so there were seven red

arrows]). Two green arrows marked down-regulated genes in AE1.

Figure 6. Relative expression analysis of 8 PPR candidate gene s in buds of 6 different

fertility cotton species. The expression in bud of 2074A was considered as the control, and

UBQ7 was used as reference gene, and the control. The value is calculated with the method of

2-ΔΔCt.

15

Additional files

Additional file 1: Figure S1. Single-chromosome phylogenetic analyses of PPR genes in G.

raimondii (D5) genome and G. arboreum (A2) genome by Maximum Likelihood method.

Figure S2. Phylogenetic analysis on PPR genes on chromosome D05 in G. hirsutum (AD1)

genome. Box: G. raimondii; dot: G. arboretum; triangle: outgroups (Petunia x hybrid, Oryza

sativa ssp. indica, B. napus, R. sativus and A. thaliana); solid: candidate PPR genes.

Additional file 2: Table S1. Information of PPR candidate genes derived from 13

chromosomes of G. raimondii (D5) genome. Table S2. Information of PPR candidate genes

derived from 13 chromosomes of G. arboreum (A2) genome.

Table S1 Information of PPR candidate genes derived from 13 chromosomes of G. raimondii (D5) genome

Chromosome No. of gene Gene No. of sequence Sequence

chr01 1 Gorai.001G1316 1 Gorai.001G131600.1

chr02 2 Gorai.002G0718 3 Gorai.002G071800.1,

Gorai.002G071800.2

Gorai.002G1010 Gorai.002G101000.1

chr03 1 Gorai.003G1716 1 Gorai.003G171600.1

chr04 3 Gorai.004G2907 3 Gorai.004G290700.1

Gorai.004G2406 Gorai.004G240600.1

Gorai.004G2438 Gorai.004G243800.1

chr05 1 Gorai.005G0470 1 Gorai.005G047000.1

chr06 2 Gorai.006G2252 2 Gorai.006G225200.1

16

Gorai.006G2471 Gorai.006G247100.1

chr07 1 Gorai.007G1431 1 Gorai.007G143100.1

chr08 1 Gorai.008G0443 1 Gorai.008G044300.1

chr09 4 Gorai.009G3762 4 Gorai.009G376200.1

Gorai.009G2580 Gorai.009G258000.1

Gorai.009G0058 Gorai.009G005800.1

Gorai.009G1519 Gorai.009G151900.1

chr10 4 Gorai.010G2281 9 Gorai.010G228100.1,

Gorai.010G228100.2

Gorai.010G0536

Gorai.010G053600.1,

Gorai.010G053600.2

Gorai.010G0325

Gorai.010G032500.1,

Gorai.010G032500.2,

Gorai.010G032500.3

Gorai.010G0722

Gorai.010G072200.1,

Gorai.010G072200.2

chr11 10 Gorai.011G1557 11 Gorai.011G155700.1

Gorai.011G1515 Gorai.011G151500.1

Gorai.011G1514 Gorai.011G151400.1

Gorai.011G1511 Gorai.011G151100.1

Gorai.011G1512 Gorai.011G151200.1

Gorai.011G1451 Gorai.011G145100.1

Gorai.011G1464

Gorai.011G146400.1,

Gorai.011G146400.2

Gorai.011G1466 Gorai.011G146600.1

Gorai.011G1450 Gorai.011G145000.1

Gorai.011G1465 Gorai.011G146500.1

chr12 4 Gorai.012G1593 9 Gorai.012G159300.1,

Gorai.012G159300.2,

Gorai.012G159300.3

Gorai.012G1205

Gorai.012G120500.1,

Gorai.012G120500.2

Gorai.012G0303

Gorai.012G030300.1,

Gorai.012G030300.2

Gorai.012G1494

Gorai.012G149400.1,

Gorai.012G149400.2

chr13 2 Gorai.013G0606 4 Gorai.013G060600.1,

Gorai.013G060600.2

Gorai.013G0109

Gorai.013G010900.1,

Gorai.013G010900.2

Total No. 36 50

Table S2 Information of PPR candidate genes derived from 13 chromosomes of G. arboreum (A2) genome

Chromosome No. of gene Gene No. of sequence Sequence

chr01 1 Cotton_A_32157 1 Cotton_A_32157

chr02 3

Cotton_A_37656

3

Cotton_A_37656

Cotton_A_28832 Cotton_A_28832

Cotton_A_00514 Cotton_A_00514

chr03 2 Cotton_A_16847

2 Cotton_A_16847

Cotton_A_18522 Cotton_A_18522

chr04 2 Cotton_A_26557 2 Cotton_A_26557

17

Cotton_A_03817 Cotton_A_03817

chr05 1 Cotton_A_08373 1 Cotton_A_08373

chr06 1 Cotton_A_27681 1 Cotton_A_27681

chr07 1 Cotton_A_06850 1 Cotton_A_06850

chr08 0 -- 0 --

chr09 1 Cotton_A_02931 1 Cotton_A_02931

chr10 2 Cotton_A_04606

2 Cotton_A_04606

Cotton_A_29300 Cotton_A_29300

chr11 3

Cotton_A_13069

3

Cotton_A_13069

Cotton_A_17619 Cotton_A_17619

Cotton_A_14743 Cotton_A_14743

chr12 0 -- 0 --

chr13 1 Cotton_A_26837 1 Cotton_A_26837

others 1 Cotton_A_37173 1 Cotton_A_37173

Total No. 19 19

Additional file 3: Table S3. Nucleotide substitution rates of PPR homologous genes between

G. raimondii (D5) and G. arboreum (A2) genomes.

Table S3. Nucleotide substitution rates of PPR homologous genes between G. raimondii (D5) and G. arboreum (A2)

genomes

Numbe

r gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

1c 2 (Cotton_A_11317) vs. 1 (Gorai.009G1666) 0.017 0.009 1.980c

2 2 (Cotton_A_28224) vs. 1 (Gorai.013G0655) 0.016 0.014 1.140

3 2 (Cotton_A_09575) vs. 1 (Gorai.001G1210) 0.014 0.014 1.051

4 2 (Cotton_A_34843) vs. 1 (Gorai.012G1341) 0.015 0.014 1.042

5 2 (Cotton_A_31264) vs. 1 (Gorai.004G1601) 0.042 0.042 1.015

6 2 (Cotton_A_11768) vs. 1 (Gorai.012G1877) 0.022 0.022 1.013

7b 2 (Cotton_A_06304) vs. 1 (Gorai.002G0718) 0.014 0.014 0.999

8b 2 (Cotton_A_19072) vs. 1 (Gorai.012G0303) 0.035 0.037 0.957

9 2 (Cotton_A_01860) vs. 1 (Gorai.006G1315) 0.023 0.025 0.939

10 2 (Cotton_A_32325) vs. 1 (Gorai.001G2487) 0.019 0.021 0.935

11 2 (Cotton_A_27929) vs. 1 (Gorai.013G1240) 0.018 0.020 0.928

12 2 (Cotton_A_18719) vs. 1 (Gorai.001G2147) 0.046 0.050 0.917

18

13b 2 (Cotton_A_23084) vs. 1 (Gorai.011G1451) 0.015 0.018 0.851

14 2 (Cotton_A_02450) vs. 1 (Gorai.006G1772) 0.015 0.017 0.848

15 2 (Cotton_A_39828) vs. 1 (Gorai.001G2016) 0.016 0.019 0.832

16 2 (Cotton_A_20263) vs. 1 (Gorai.012G0213) 0.062 0.075 0.831

17 2 (Cotton_A_17523) vs. 1 (Gorai.011G0381) 0.013 0.015 0.825

18 2 (Cotton_A_35996) vs. 1 (Gorai.004G0976) 0.046 0.058 0.802

19 2 (Cotton_A_10814) vs. 1 (Gorai.008G0282) 0.120 0.152 0.792

20 2 (Cotton_A_11187) vs. 1 (Gorai.003G0412) 0.025 0.032 0.769

21 2 (Cotton_A_27680) vs. 1 (Gorai.008G1927) 0.018 0.024 0.763

22b 2 (Cotton_A_24724) vs. 1 (Gorai.006G2471) 0.027 0.036 0.748

23 2 (Cotton_A_24061) vs. 1 (Gorai.006G1651) 0.016 0.021 0.747

24 2 (Cotton_A_22811) vs. 1 (Gorai.005G1522) 0.019 0.026 0.745

25 2 (Cotton_A_40801) vs. 1 (Gorai.010G1366) 0.021 0.029 0.725

26 2 (Cotton_A_05635) vs. 1 (Gorai.001G0212) 0.018 0.025 0.718

27b 2 (Cotton_A_30591) vs. 1 (Gorai.003G1716) 0.032 0.046 0.710

28 2 (Cotton_A_03316) vs. 1 (Gorai.004G0658) 0.023 0.032 0.708

29 2 (Cotton_A_01798) vs. 1 (Gorai.006G1265) 0.013 0.019 0.699

30 2 (Cotton_A_26211) vs. 1 (Gorai.003G0971) 0.073 0.105 0.691

31 2 (Cotton_A_26989) vs. 1 (Gorai.004G0714) 0.012 0.017 0.684

32 2 (Cotton_A_33958) vs. 1 (Gorai.003G0875) 0.011 0.017 0.683

33 2 (Cotton_A_32710) vs. 1 (Gorai.008G2494) 0.015 0.022 0.674

34a 2 (Cotton_A_04606) vs. 1 (Gorai.009G1866) 0.016 0.023 0.672

35 2 (Cotton_A_22116) vs. 1 (Gorai.002G2139) 0.021 0.032 0.667

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

36 2 (Cotton_A_17776) vs. 1 (Gorai.008G2735) 0.019 0.029 0.666

37 2 (Cotton_A_17484) vs. 1 (Gorai.011G0686) 0.015 0.023 0.663

38 2 (Cotton_A_36527) vs. 1 (Gorai.013G1385) 0.019 0.029 0.660

39 2 (Cotton_A_02090) vs. 1 (Gorai.008G0415) 0.021 0.031 0.659

40 2 (Cotton_A_13788) vs. 1 (Gorai.010G1922) 0.014 0.021 0.656

41 2 (Cotton_A_10685) vs. 1 (Gorai.007G3051) 0.013 0.020 0.630

42 2 (Cotton_A_37988) vs. 1 (Gorai.001G2469) 0.014 0.022 0.629

43 2 (Cotton_A_33828) vs. 1 (Gorai.013G0985) 0.014 0.022 0.625

44 2 (Cotton_A_22551) vs. 1 (Gorai.004G0846) 0.015 0.024 0.614

45 2 (Cotton_A_06956) vs. 1 (Gorai.009G0970) 0.052 0.085 0.613

46 2 (Cotton_A_34686) vs. 1 (Gorai.001G1887) 0.016 0.026 0.611

47 2 (Cotton_A_23085) vs. 1 (Gorai.011G1450) 0.064 0.105 0.607

19

48 2 (Cotton_A_06080) vs. 1 (Gorai.012G0077) 0.077 0.127 0.607

49 2 (Cotton_A_23145) vs. 1 (Gorai.013G1741) 0.021 0.034 0.601

50 2 (Cotton_A_32339) vs. 1 (Gorai.007G0962) 0.016 0.027 0.598

51ab 2 (Cotton_A_26557) vs. 1 (Gorai.007G1431) 0.064 0.107 0.597

52 2 (Cotton_A_20633) vs. 1 (Gorai.007G0758) 0.032 0.055 0.590

53 2 (Cotton_A_18215) vs. 1 (Gorai.005G1471) 0.014 0.024 0.588

54 2 (Cotton_A_07425) vs. 1 (Gorai.009G4553) 0.017 0.029 0.588

55 2 (Cotton_A_16281) vs. 1 (Gorai.010G0532) 0.014 0.023 0.586

56 2 (Cotton_A_36956) vs. 1 (Gorai.012G0712) 0.012 0.021 0.585

57 2 (Cotton_A_16549) vs. 1 (Gorai.006G0786) 0.019 0.033 0.582

58 2 (Cotton_A_28680) vs. 1 (Gorai.002G1322) 0.016 0.027 0.579

59 2 (Cotton_A_30727) vs. 1 (Gorai.011G1778) 0.018 0.031 0.578

60 2 (Cotton_A_26765) vs. 1 (Gorai.001G1106) 0.011 0.020 0.575

61 2 (Cotton_A_24224) vs. 1 (Gorai.013G1917) 0.017 0.030 0.574

62 2 (Cotton_A_28020) vs. 1 (Gorai.001G0406) 0.015 0.025 0.572

63 2 (Cotton_A_04063) vs. 1 (Gorai.009G2942) 0.016 0.029 0.567

64 2 (Cotton_A_07524) vs. 1 (Gorai.011G0889) 0.012 0.021 0.565

65 2 (Cotton_A_10449) vs. 1 (Gorai.011G2299) 0.082 0.148 0.555

66 2 (Cotton_A_36721) vs. 1 (Gorai.002G0574) 0.079 0.144 0.553

67 2 (Cotton_A_30330) vs. 1 (Gorai.010G2078) 0.015 0.028 0.553

68 2 (Cotton_A_06176) vs. 1 (Gorai.008G1784) 0.012 0.022 0.551

69 2 (Cotton_A_20922) vs. 1 (Gorai.008G1195) 0.015 0.027 0.549

70 2 (Cotton_A_01973) vs. 1 (Gorai.007G0468) 0.015 0.028 0.549

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

71 2 (Cotton_A_33269) vs. 1 (Gorai.003G1164) 0.024 0.043 0.549

72 2 (Cotton_A_24707) vs. 1 (Gorai.011G2260) 0.016 0.029 0.543

73 2 (Cotton_A_07464) vs. 1 (Gorai.009G4512) 0.015 0.028 0.542

74 2 (Cotton_A_13263) vs. 1 (Gorai.001G1344) 0.015 0.028 0.539

75 2 (Cotton_A_25685) vs. 1 (Gorai.005G1720) 0.016 0.029 0.538

76 2 (Cotton_A_09296) vs. 1 (Gorai.007G2760) 0.020 0.038 0.536

77 2 (Cotton_A_40007) vs. 1 (Gorai.013G0825) 0.013 0.024 0.533

78 2 (Cotton_A_20041) vs. 1 (Gorai.013G2630) 0.019 0.036 0.532

79 2 (Cotton_A_13147) vs. 1 (Gorai.004G1887) 0.019 0.035 0.528

80 2 (Cotton_A_29248) vs. 1 (Gorai.010G2243) 0.015 0.029 0.526

81 2 (Cotton_A_34976) vs. 1 (Gorai.011G1657) 0.027 0.051 0.523

82 2 (Cotton_A_35366) vs. 1 (Gorai.002G1719) 0.016 0.030 0.521

20

83 2 (Cotton_A_14076) vs. 1 (Gorai.001G1484) 0.016 0.031 0.521

84 2 (Cotton_A_33069) vs. 1 (Gorai.006G0742) 0.066 0.127 0.520

85 2 (Cotton_A_13386) vs. 1 (Gorai.008G1951) 0.026 0.050 0.516

86 2 (Cotton_A_37189) vs. 1 (Gorai.005G1479) 0.017 0.032 0.515

87 2 (Cotton_A_25515) vs. 1 (Gorai.010G1802) 0.016 0.032 0.511

88 2 (Cotton_A_16155) vs. 1 (Gorai.011G0974) 0.012 0.023 0.511

89 2 (Cotton_A_15955) vs. 1 (Gorai.008G1022) 0.016 0.031 0.509

90ab 2 (Cotton_A_18522) vs. 1 (Gorai.013G0606) 0.075 0.148 0.509

91 2 (Cotton_A_27424) vs. 1 (Gorai.009G1101) 0.014 0.027 0.508

92 2 (Cotton_A_22374) vs. 1 (Gorai.013G1986) 0.012 0.023 0.506

93 2 (Cotton_A_04268) vs. 1 (Gorai.003G1421) 0.025 0.049 0.505

94 2 (Cotton_A_00752) vs. 1 (Gorai.005G2425) 0.019 0.038 0.503

95 2 (Cotton_A_40015) vs. 1 (Gorai.010G1439) 0.012 0.023 0.501

96 2 (Cotton_A_03759) vs. 1 (Gorai.007G1608) 0.010 0.020 0.501

97a 2 (Cotton_A_14743) vs. 1 (Gorai.006G0084) 0.018 0.037 0.498

98 2 (Cotton_A_23993) vs. 1 (Gorai.004G0731) 0.015 0.030 0.498

99 2 (Cotton_A_09794) vs. 1 (Gorai.003G0326) 0.016 0.032 0.491

100 2 (Cotton_A_21717) vs. 1 (Gorai.005G2322) 0.010 0.021 0.490

101 2 (Cotton_A_41153) vs. 1 (Gorai.013G1059) 0.017 0.035 0.490

102 2 (Cotton_A_09162) vs. 1 (Gorai.007G0312) 0.016 0.033 0.489

103 2 (Cotton_A_25415) vs. 1 (Gorai.009G4142) 0.016 0.032 0.488

104b 2 (Cotton_A_06370) vs. 1 (Gorai.012G1494) 0.012 0.024 0.486

105 2 (Cotton_A_37545) vs. 1 (Gorai.003G1355) 0.013 0.027 0.485

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

106 2 (Cotton_A_25609) vs. 1 (Gorai.005G1943) 0.024 0.049 0.483

107 2 (Cotton_A_01539) vs. 1 (Gorai.003G0072) 0.014 0.029 0.480

108 2 (Cotton_A_04493) vs. 1 (Gorai.008G1563) 0.014 0.030 0.479

109 2 (Cotton_A_33520) vs. 1 (Gorai.009G3754) 0.029 0.060 0.478

110 2 (Cotton_A_01088) vs. 1 (Gorai.009G0711) 0.019 0.039 0.477

111 2 (Cotton_A_08027) vs. 1 (Gorai.008G2080) 0.016 0.035 0.476

112 2 (Cotton_A_20227) vs. 1 (Gorai.008G1687) 0.011 0.023 0.476

113 2 (Cotton_A_07222) vs. 1 (Gorai.001G0292) 0.059 0.125 0.473

114 2 (Cotton_A_32576) vs. 1 (Gorai.008G0862) 0.016 0.033 0.472

115 2 (Cotton_A_13722) vs. 1 (Gorai.009G2054) 0.022 0.047 0.471

116 2 (Cotton_A_23893) vs. 1 (Gorai.003G1142) 0.038 0.080 0.471

117 2 (Cotton_A_12931) vs. 1 (Gorai.009G0926) 0.012 0.025 0.470

21

118 2 (Cotton_A_28094) vs. 1 (Gorai.005G1628) 0.017 0.035 0.469

119 2 (Cotton_A_39104) vs. 1 (Gorai.009G4026) 0.014 0.030 0.468

120 2 (Cotton_A_01590) vs. 1 (Gorai.003G0127) 0.047 0.101 0.468

121 2 (Cotton_A_14708) vs. 1 (Gorai.006G0114) 0.020 0.043 0.466

122 2 (Cotton_A_29057) vs. 1 (Gorai.012G0843) 0.013 0.028 0.465

123 2 (Cotton_A_00282) vs. 1 (Gorai.002G2674) 0.015 0.033 0.458

124 2 (Cotton_A_24369) vs. 1 (Gorai.007G2126) 0.013 0.028 0.455

125 2 (Cotton_A_36268) vs. 1 (Gorai.006G0702) 0.017 0.038 0.453

126 2 (Cotton_A_32770) vs. 1 (Gorai.007G1442) 0.011 0.023 0.452

127 2 (Cotton_A_28557) vs. 1 (Gorai.002G1203) 0.012 0.026 0.452

128 2 (Cotton_A_19260) vs. 1 (Gorai.008G2669) 0.014 0.032 0.451

129 2 (Cotton_A_26614) vs. 1 (Gorai.004G0254) 0.014 0.031 0.447

130 2 (Cotton_A_37026) vs. 1 (Gorai.008G0731) 0.045 0.100 0.445

131b 2 (Cotton_A_30160) vs. 1 (Gorai.011G1557) 0.078 0.175 0.444

132 2 (Cotton_A_28368) vs. 1 (Gorai.007G2806) 0.013 0.029 0.443

133 2 (Cotton_A_12602) vs. 1 (Gorai.004G0487) 0.018 0.041 0.441

134 2 (Cotton_A_17392) vs. 1 (Gorai.008G2642) 0.016 0.037 0.439

135 2 (Cotton_A_29863) vs. 1 (Gorai.012G0397) 0.011 0.024 0.438

136b 2 (Cotton_A_13296) vs. 1 (Gorai.001G1316) 0.135 0.311 0.433

137 2 (Cotton_A_26223) vs. 1 (Gorai.003G0959) 0.014 0.033 0.433

138 2 (Cotton_A_13509) vs. 1 (Gorai.010G1420) 0.013 0.031 0.426

139b 2 (Cotton_A_16278) vs. 1 (Gorai.010G0536) 0.018 0.041 0.426

140 2 (Cotton_A_10671) vs. 1 (Gorai.007G3063) 0.038 0.090 0.425

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

141 2 (Cotton_A_13298) vs. 1 (Gorai.001G1314) 0.010 0.024 0.423

142 2 (Cotton_A_38722) vs. 1 (Gorai.011G1213) 0.011 0.027 0.423

143 2 (Cotton_A_25573) vs. 1 (Gorai.009G1720) 0.017 0.040 0.422

144 2 (Cotton_A_21594) vs. 1 (Gorai.006G1964) 0.012 0.028 0.421

145 2 (Cotton_A_04706) vs. 1 (Gorai.009G1765) 0.014 0.032 0.420

146 2 (Cotton_A_17973) vs. 1 (Gorai.005G2043) 0.009 0.021 0.420

147b 2 (Cotton_A_34636) vs. 1 (Gorai.010G0722) 0.011 0.026 0.419

148 2 (Cotton_A_19585) vs. 1 (Gorai.005G0851) 0.016 0.038 0.419

149 2 (Cotton_A_28977) vs. 1 (Gorai.006G0322) 0.016 0.040 0.415

150 2 (Cotton_A_27278) vs. 1 (Gorai.010G2089) 0.020 0.049 0.415

151 2 (Cotton_A_16896) vs. 1 (Gorai.009G0068) 0.021 0.049 0.415

152 2 (Cotton_A_28444) vs. 1 (Gorai.005G1433) 0.013 0.033 0.413

22

153 2 (Cotton_A_03342) vs. 1 (Gorai.013G1342) 0.014 0.034 0.413

154 2 (Cotton_A_00872) vs. 1 (Gorai.013G0288) 0.012 0.028 0.413

155 2 (Cotton_A_10869) vs. 1 (Gorai.007G0604) 0.016 0.040 0.413

156 2 (Cotton_A_16777) vs. 1 (Gorai.013G1692) 0.023 0.056 0.411

157 2 (Cotton_A_35842) vs. 1 (Gorai.005G0550) 0.017 0.041 0.405

158 2 (Cotton_A_10841) vs. 1 (Gorai.009G0215) 0.011 0.028 0.405

159 2 (Cotton_A_26278) vs. 1 (Gorai.006G1122) 0.008 0.019 0.404

160 2 (Cotton_A_01069) vs. 1 (Gorai.009G0728) 0.011 0.028 0.403

161 2 (Cotton_A_09130) vs. 1 (Gorai.007G0282) 0.014 0.036 0.402

162 2 (Cotton_A_12265) vs. 1 (Gorai.007G1696) 0.017 0.043 0.401

163 2 (Cotton_A_18227) vs. 1 (Gorai.010G0016) 0.010 0.025 0.401

164 2 (Cotton_A_18697) vs. 1 (Gorai.001G2161) 0.017 0.043 0.400

165 2 (Cotton_A_11292) vs. 1 (Gorai.009G1643) 0.026 0.065 0.399

166 2 (Cotton_A_02403) vs. 1 (Gorai.007G0141) 0.010 0.025 0.399

167 2 (Cotton_A_30825) vs. 1 (Gorai.012G0828) 0.010 0.026 0.398

168 2 (Cotton_A_34712) vs. 1 (Gorai.009G3026) 0.016 0.040 0.397

169 2 (Cotton_A_16289) vs. 1 (Gorai.010G0526) 0.025 0.063 0.397

170a 2 (Cotton_A_37656) vs. 1 (Gorai.003G0508) 0.010 0.025 0.396

171a 2 (Cotton_A_03817) vs. 1 (Gorai.007G1554) 0.009 0.022 0.395

172 2 (Cotton_A_13567) vs. 1 (Gorai.008G2191) 0.019 0.049 0.392

173 2 (Cotton_A_36306) vs. 1 (Gorai.006G0750) 0.034 0.087 0.391

174 2 (Cotton_A_15492) vs. 1 (Gorai.007G0816) 0.017 0.043 0.390

175 2 (Cotton_A_26419) vs. 1 (Gorai.008G1310) 0.017 0.044 0.389

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

176 2 (Cotton_A_41044) vs. 1 (Gorai.005G1192) 0.014 0.035 0.388

177 2 (Cotton_A_32294) vs. 1 (Gorai.001G1527) 0.009 0.022 0.387

178 2 (Cotton_A_31588) vs. 1 (Gorai.012G1018) 0.008 0.021 0.386

179 2 (Cotton_A_27659) vs. 1 (Gorai.013G0860) 0.044 0.115 0.386

180 2 (Cotton_A_15598) vs. 1 (Gorai.010G2189) 0.014 0.036 0.383

181b 2 (Cotton_A_30368) vs. 1 (Gorai.009G2580) 0.010 0.026 0.380

182b 2 (Cotton_A_24432) vs. 1 (Gorai.011G1512) 0.017 0.046 0.378

183 2 (Cotton_A_05741) vs. 1 (Gorai.004G2583) 0.009 0.025 0.378

184 2 (Cotton_A_30003) vs. 1 (Gorai.001G1384) 0.013 0.034 0.377

185 2 (Cotton_A_04299) vs. 1 (Gorai.005G0022) 0.009 0.024 0.376

186 2 (Cotton_A_02560) vs. 1 (Gorai.013G2504) 0.016 0.043 0.372

187 2 (Cotton_A_35950) vs. 1 (Gorai.002G2403) 0.019 0.052 0.371

23

188ab 2 (Cotton_A_08373) vs. 1 (Gorai.005G0470) 0.013 0.035 0.370

189 2 (Cotton_A_26200) vs. 1 (Gorai.003G0982) 0.015 0.042 0.370

190 2 (Cotton_A_17963) vs. 1 (Gorai.005G2054) 0.007 0.018 0.370

191 2 (Cotton_A_17735) vs. 1 (Gorai.008G2697) 0.017 0.047 0.369

192b 2 (Cotton_A_02057) vs. 1 (Gorai.008G0443) 0.018 0.048 0.368

193 2 (Cotton_A_30072) vs. 1 (Gorai.006G0901) 0.013 0.034 0.364

194 2 (Cotton_A_31804) vs. 1 (Gorai.004G1641) 0.018 0.051 0.363

195 2 (Cotton_A_26302) vs. 1 (Gorai.004G1943) 0.009 0.024 0.362

196 2 (Cotton_A_39558) vs. 1 (Gorai.009G0359) 0.010 0.027 0.362

197 2 (Cotton_A_40005) vs. 1 (Gorai.001G1983) 0.011 0.031 0.361

198 2 (Cotton_A_10110) vs. 1 (Gorai.011G2952) 0.021 0.058 0.360

199 2 (Cotton_A_38063) vs. 1 (Gorai.006G0458) 0.011 0.031 0.359

200 2 (Cotton_A_34312) vs. 1 (Gorai.001G1555) 0.054 0.150 0.357

201 2 (Cotton_A_31539) vs. 1 (Gorai.012G0855) 0.016 0.045 0.355

202 2 (Cotton_A_35528) vs. 1 (Gorai.009G0027) 0.012 0.034 0.355

203 2 (Cotton_A_35532) vs. 1 (Gorai.009G0032) 0.009 0.026 0.354

204 2 (Cotton_A_20411) vs. 1 (Gorai.009G3837) 0.010 0.028 0.353

205 2 (Cotton_A_23909) vs. 1 (Gorai.005G0195) 0.103 0.296 0.349

206 2 (Cotton_A_38340) vs. 1 (Gorai.010G1312) 0.016 0.047 0.348

207 2 (Cotton_A_07872) vs. 1 (Gorai.008G2292) 0.017 0.049 0.347

208 2 (Cotton_A_19584) vs. 1 (Gorai.005G0852) 0.012 0.036 0.346

209 2 (Cotton_A_23809) vs. 1 (Gorai.005G1813) 0.012 0.035 0.346

210 2 (Cotton_A_00454) vs. 1 (Gorai.002G2491) 0.014 0.041 0.343

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

211 2 (Cotton_A_10325) vs. 1 (Gorai.001G0085) 0.008 0.023 0.342

212 2 (Cotton_A_34264) vs. 1 (Gorai.004G1072) 0.011 0.033 0.341

213 2 (Cotton_A_37646) vs. 1 (Gorai.008G0725) 0.011 0.031 0.340

214 2 (Cotton_A_00653) vs. 1 (Gorai.005G2529) 0.016 0.048 0.340

215 2 (Cotton_A_24627) vs. 1 (Gorai.011G1119) 0.017 0.050 0.340

216 2 (Cotton_A_16144) vs. 1 (Gorai.009G2412) 0.010 0.029 0.339

217a 2 (Cotton_A_02931) vs. 1 (Gorai.011G0480) 0.013 0.039 0.337

218 2 (Cotton_A_35522) vs. 1 (Gorai.009G1913) 0.006 0.018 0.336

219 2 (Cotton_A_25881) vs. 1 (Gorai.008G1273) 0.013 0.038 0.336

220 2 (Cotton_A_10533) vs. 1 (Gorai.013G2656) 0.018 0.054 0.334

221 2 (Cotton_A_09637) vs. 1 (Gorai.006G2666) 0.012 0.035 0.333

222 2 (Cotton_A_01123) vs. 1 (Gorai.009G0676) 0.011 0.034 0.333

24

223 2 (Cotton_A_24301) vs. 1 (Gorai.004G2353) 0.013 0.039 0.330

224 2 (Cotton_A_05860) vs. 1 (Gorai.001G2321) 0.023 0.071 0.329

225 2 (Cotton_A_26945) vs. 1 (Gorai.009G3291) 0.009 0.027 0.329

226 2 (Cotton_A_12830) vs. 1 (Gorai.013G1878) 0.012 0.037 0.328

227 2 (Cotton_A_15541) vs. 1 (Gorai.013G1581) 0.014 0.042 0.328

228 2 (Cotton_A_06044) vs. 1 (Gorai.012G0114) 0.010 0.031 0.328

229 2 (Cotton_A_07988) vs. 1 (Gorai.008G2109) 0.015 0.045 0.327

230 2 (Cotton_A_17092) vs. 1 (Gorai.009G1232) 0.012 0.037 0.327

231 2 (Cotton_A_31527) vs. 1 (Gorai.012G0866) 0.010 0.032 0.325

232 2 (Cotton_A_15473) vs. 1 (Gorai.007G0798) 0.009 0.028 0.321

233 2 (Cotton_A_09139) vs. 1 (Gorai.007G0293) 0.036 0.112 0.320

234 2 (Cotton_A_32227) vs. 1 (Gorai.010G2146) 0.010 0.031 0.320

235 2 (Cotton_A_09931) vs. 1 (Gorai.004G2035) 0.013 0.042 0.318

236 2 (Cotton_A_23072) vs. 1 (Gorai.011G1462) 0.010 0.030 0.317

237 2 (Cotton_A_32556) vs. 1 (Gorai.006G2453) 0.013 0.042 0.317

238 2 (Cotton_A_27318) vs. 1 (Gorai.013G2436) 0.013 0.041 0.316

239 2 (Cotton_A_33638) vs. 1 (Gorai.010G1884) 0.009 0.029 0.315

240 2 (Cotton_A_21338) vs. 1 (Gorai.008G0551) 0.008 0.025 0.314

241 2 (Cotton_A_25940) vs. 1 (Gorai.006G0534) 0.011 0.036 0.312

242 2 (Cotton_A_21752) vs. 1 (Gorai.002G2007) 0.010 0.033 0.307

243 2 (Cotton_A_01226) vs. 1 (Gorai.009G0573) 0.011 0.036 0.306

244 2 (Cotton_A_30013) vs. 1 (Gorai.001G1393) 0.017 0.057 0.304

245 2 (Cotton_A_28264) vs. 1 (Gorai.012G1386) 0.011 0.037 0.303

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

246 2 (Cotton_A_09578) vs. 1 (Gorai.001G1213) 0.010 0.032 0.302

247 2 (Cotton_A_18311) vs. 1 (Gorai.012G0512) 0.008 0.025 0.302

248 2 (Cotton_A_32575) vs. 1 (Gorai.008G0863) 0.010 0.034 0.301

249 2 (Cotton_A_17786) vs. 1 (Gorai.002G0314) 0.018 0.059 0.300

250 2 (Cotton_A_27064) vs. 1 (Gorai.006G0233) 0.016 0.054 0.300

251 2 (Cotton_A_01162) vs. 1 (Gorai.009G0634) 0.007 0.022 0.299

252b 2 (Cotton_A_03417) vs. 1 (Gorai.012G1205) 0.010 0.032 0.297

253 2 (Cotton_A_08971) vs. 1 (Gorai.005G2117) 0.011 0.036 0.295

254 2 (Cotton_A_13392) vs. 1 (Gorai.008G1947) 0.011 0.037 0.289

255 2 (Cotton_A_18561) vs. 1 (Gorai.003G1318) 0.015 0.051 0.287

256 2 (Cotton_A_18120) vs. 1 (Gorai.007G2674) 0.009 0.031 0.286

257 2 (Cotton_A_04319) vs. 1 (Gorai.005G0042) 0.018 0.064 0.284

25

258 2 (Cotton_A_14721) vs. 1 (Gorai.006G0103) 0.013 0.044 0.284

259 2 (Cotton_A_30439) vs. 1 (Gorai.013G1427) 0.009 0.033 0.281

260 2 (Cotton_A_14702) vs. 1 (Gorai.006G0119) 0.009 0.031 0.280

261 2 (Cotton_A_13590) vs. 1 (Gorai.008G2169) 0.018 0.067 0.277

262 2 (Cotton_A_22790) vs. 1 (Gorai.009G3664) 0.019 0.071 0.274

263 2 (Cotton_A_38720) vs. 1 (Gorai.008G0933) 0.011 0.039 0.273

264 2 (Cotton_A_27473) vs. 1 (Gorai.004G0668) 0.023 0.085 0.273

265 2 (Cotton_A_36351) vs. 1 (Gorai.007G2493) 0.012 0.043 0.273

266b 2 (Cotton_A_23070) vs. 1 (Gorai.011G1464) 0.008 0.028 0.273

267 2 (Cotton_A_10904) vs. 1 (Gorai.013G1892) 0.007 0.027 0.269

268 2 (Cotton_A_22715) vs. 1 (Gorai.001G2759) 0.013 0.047 0.269

269 2 (Cotton_A_19225) vs. 1 (Gorai.001G0393) 0.010 0.038 0.268

270 2 (Cotton_A_37812) vs. 1 (Gorai.005G1369) 0.011 0.040 0.266

271b 2 (Cotton_A_23068) vs. 1 (Gorai.011G1466) 0.011 0.042 0.266

272 2 (Cotton_A_14943) vs. 1 (Gorai.012G1311) 0.014 0.052 0.266

273 2 (Cotton_A_34606) vs. 1 (Gorai.005G0815) 0.013 0.050 0.264

274 2 (Cotton_A_37555) vs. 1 (Gorai.007G3444) 0.016 0.062 0.264

275 2 (Cotton_A_18626) vs. 1 (Gorai.001G0621) 0.011 0.041 0.262

276 2 (Cotton_A_36364) vs. 1 (Gorai.013G1093) 0.009 0.033 0.261

277 2 (Cotton_A_00058) vs. 1 (Gorai.002G0209) 0.013 0.051 0.260

278 2 (Cotton_A_15782) vs. 1 (Gorai.008G0143) 0.010 0.038 0.260

279 2 (Cotton_A_07841) vs. 1 (Gorai.009G1982) 0.008 0.029 0.259

280 2 (Cotton_A_23711) vs. 1 (Gorai.006G0946) 0.009 0.035 0.258

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

281 2 (Cotton_A_07488) vs. 1 (Gorai.011G0852) 0.011 0.042 0.256

282 2 (Cotton_A_14663) vs. 1 (Gorai.007G2016) 0.007 0.029 0.256

283 2 (Cotton_A_31974) vs. 1 (Gorai.008G1999) 0.013 0.051 0.256

284b 2 (Cotton_A_16907) vs. 1 (Gorai.009G0058) 0.004 0.017 0.255

285 2 (Cotton_A_15028) vs. 1 (Gorai.007G2884) 0.012 0.046 0.255

286 2 (Cotton_A_23793) vs. 1 (Gorai.005G1802) 0.012 0.048 0.255

287 2 (Cotton_A_28198) vs. 1 (Gorai.006G0256) 0.029 0.114 0.255

288 2 (Cotton_A_16255) vs. 1 (Gorai.009G0096) 0.009 0.037 0.253

289 2 (Cotton_A_04367) vs. 1 (Gorai.005G0092) 0.014 0.055 0.253

290 2 (Cotton_A_41137) vs. 1 (Gorai.012G0916) 0.008 0.031 0.252

291 2 (Cotton_A_30639) vs. 1 (Gorai.002G2377) 0.013 0.051 0.252

292 2 (Cotton_A_22070) vs. 1 (Gorai.006G0131) 0.009 0.035 0.250

26

293 2 (Cotton_A_07731) vs. 1 (Gorai.007G0907) 0.011 0.042 0.249

294 2 (Cotton_A_32048) vs. 1 (Gorai.009G1272) 0.009 0.037 0.244

295 2 (Cotton_A_18858) vs. 1 (Gorai.010G0754) 0.010 0.043 0.244

296 2 (Cotton_A_39875) vs. 1 (Gorai.004G1105) 0.011 0.045 0.241

297 2 (Cotton_A_02968) vs. 1 (Gorai.007G1523) 0.008 0.034 0.241

298 2 (Cotton_A_15244) vs. 1 (Gorai.008G0378) 0.011 0.046 0.239

299a 2 (Cotton_A_06850) vs. 1 (Gorai.010G0097) 0.012 0.050 0.238

300 2 (Cotton_A_34461) vs. 1 (Gorai.013G1092) 0.011 0.047 0.237

301b 2 (Cotton_A_28288) vs. 1 (Gorai.010G2281) 0.012 0.050 0.236

302 2 (Cotton_A_00777) vs. 1 (Gorai.005G2405) 0.010 0.043 0.235

303 2 (Cotton_A_00501) vs. 1 (Gorai.002G2447) 0.011 0.048 0.234

304a 2 (Cotton_A_27681) vs. 1 (Gorai.008G1926) 0.011 0.046 0.232

305 2 (Cotton_A_01429) vs. 1 (Gorai.008G2890) 0.009 0.041 0.230

306 2 (Cotton_A_36910) vs. 1 (Gorai.008G0795) 0.007 0.032 0.230

307 2 (Cotton_A_13066) vs. 1 (Gorai.006G1904) 0.015 0.064 0.228

308 2 (Cotton_A_22059) vs. 1 (Gorai.008G1383) 0.009 0.040 0.228

309a 2 (Cotton_A_17619) vs. 1 (Gorai.006G0833) 0.014 0.062 0.228

310 2 (Cotton_A_11411) vs. 1 (Gorai.008G0595) 0.006 0.027 0.226

311 2 (Cotton_A_10175) vs. 1 (Gorai.002G2387) 0.009 0.040 0.225

312 2 (Cotton_A_30897) vs. 1 (Gorai.005G1925) 0.009 0.040 0.224

313 2 (Cotton_A_25648) vs. 1 (Gorai.011G0157) 0.013 0.059 0.223

314 2 (Cotton_A_17704) vs. 1 (Gorai.008G0480) 0.007 0.031 0.221

315 2 (Cotton_A_39243) vs. 1 (Gorai.001G1919) 0.010 0.045 0.220

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

316 2 (Cotton_A_20788) vs. 1 (Gorai.011G0198) 0.006 0.029 0.220

317 2 (Cotton_A_09744) vs. 1 (Gorai.004G1816) 0.008 0.036 0.217

318 2 (Cotton_A_16921) vs. 1 (Gorai.009G0043) 0.009 0.042 0.216

319 2 (Cotton_A_11129) vs. 1 (Gorai.009G0222) 0.008 0.038 0.216

320 2 (Cotton_A_02867) vs. 1 (Gorai.009G4208) 0.008 0.038 0.214

321a 2 (Cotton_A_00514) vs. 1 (Gorai.002G2433) 0.012 0.057 0.211

322 2 (Cotton_A_10654) vs. 1 (Gorai.007G3077) 0.007 0.034 0.210

323 2 (Cotton_A_06403) vs. 1 (Gorai.012G1462) 0.010 0.048 0.209

324 2 (Cotton_A_06599) vs. 1 (Gorai.007G3630) 0.008 0.040 0.209

325b 2 (Cotton_A_06512) vs. 1 (Gorai.002G1010) 0.009 0.046 0.204

326 2 (Cotton_A_13417) vs. 1 (Gorai.006G1548) 0.007 0.033 0.201

327 2 (Cotton_A_14089) vs. 1 (Gorai.001G1496) 0.009 0.044 0.200

27

328 2 (Cotton_A_06029) vs. 1 (Gorai.012G0133) 0.009 0.044 0.199

329d 2 (Cotton_A_09201) vs. 1 (Gorai.007G0350) 0.844d 4.238d 0.199

330 2 (Cotton_A_00336) vs. 1 (Gorai.002G2612) 0.007 0.034 0.198

331 2 (Cotton_A_00413) vs. 1 (Gorai.002G2536) 0.010 0.049 0.198

332 2 (Cotton_A_11971) vs. 1 (Gorai.004G0145) 0.017 0.087 0.196

333 2 (Cotton_A_23841) vs. 1 (Gorai.011G2097) 0.007 0.036 0.196

334 2 (Cotton_A_32763) vs. 1 (Gorai.001G1035) 0.008 0.040 0.195

335b 2 (Cotton_A_15714) vs. 1 (Gorai.013G0109) 0.009 0.044 0.194

336 2 (Cotton_A_10104) vs. 1 (Gorai.011G2958) 0.011 0.057 0.191

337 2 (Cotton_A_03168) vs. 1 (Gorai.013G2242) 0.012 0.063 0.190

338 2 (Cotton_A_30406) vs. 1 (Gorai.002G0333) 0.008 0.045 0.187

339 2 (Cotton_A_26582) vs. 1 (Gorai.008G2983) 0.006 0.034 0.186

340 2 (Cotton_A_34750) vs. 1 (Gorai.010G1466) 0.009 0.048 0.184

341 2 (Cotton_A_01310) vs. 1 (Gorai.008G2774) 0.008 0.042 0.182

342 2 (Cotton_A_28304) vs. 1 (Gorai.010G2266) 0.012 0.064 0.179

343 2 (Cotton_A_17970) vs. 1 (Gorai.005G2046) 0.006 0.032 0.177

344 2 (Cotton_A_19296) vs. 1 (Gorai.005G1892) 0.010 0.057 0.176

345b 2 (Cotton_A_11911) vs. 1 (Gorai.004G2907) 0.006 0.035 0.174

346 2 (Cotton_A_14952) vs. 1 (Gorai.012G1318) 0.010 0.061 0.171

347 2 (Cotton_A_05786) vs. 1 (Gorai.004G2627) 0.007 0.041 0.170

348 2 (Cotton_A_21706) vs. 1 (Gorai.005G2333) 0.008 0.050 0.170

349 2 (Cotton_A_34969) vs. 1 (Gorai.007G3347) 0.010 0.062 0.168

350 2 (Cotton_A_23465) vs. 1 (Gorai.009G2221) 0.008 0.049 0.168

Table S3. Nucleotide substitution rates of PPR homologous genes between D5 and A2 genomes (continued)

Number gene 2 (A2) vs. gene.1 (D5) dN dS dN/dS

351 2 (Cotton_A_23222) vs. 1 (Gorai.007G1207) 0.011 0.063 0.167

352 2 (Cotton_A_22065) vs. 1 (Gorai.008G1377) 0.011 0.068 0.163

353 2 (Cotton_A_11026) vs. 1 (Gorai.009G0768) 0.007 0.044 0.159

354 2 (Cotton_A_08048) vs. 1 (Gorai.008G2059) 0.006 0.041 0.153

355 2 (Cotton_A_00793) vs. 1 (Gorai.005G2375) 0.010 0.069 0.152

356b 2 (Cotton_A_26149) vs. 1 (Gorai.009G1519) 0.008 0.050 0.151

357 2 (Cotton_A_07893) vs. 1 (Gorai.008G2279) 0.006 0.039 0.150

358 2 (Cotton_A_01507) vs. 1 (Gorai.003G0040) 0.006 0.037 0.147

359 2 (Cotton_A_30269) vs. 1 (Gorai.006G1151) 0.006 0.042 0.137

360 2 (Cotton_A_09295) vs. 1 (Gorai.007G2759) 0.006 0.042 0.133

361 2 (Cotton_A_00464) vs. 1 (Gorai.002G2481) 0.008 0.058 0.132

362 2 (Cotton_A_12077) vs. 1 (Gorai.006G1689) 0.004 0.034 0.131

28

363 2 (Cotton_A_16188) vs. 1 (Gorai.011G0940) 0.005 0.035 0.130

364 2 (Cotton_A_16572) vs. 1 (Gorai.006G0758) 0.007 0.058 0.127

365b 2 (Cotton_A_08850) vs. 1 (Gorai.012G1593) 0.004 0.033 0.124

366 2 (Cotton_A_04919) vs. 1 (Gorai.010G2491) 0.004 0.040 0.111

367 2 (Cotton_A_34884) vs. 1 (Gorai.010G0323) 0.005 0.049 0.104

368 2 (Cotton_A_21237) vs. 1 (Gorai.011G2051) 0.002 0.020 0.100

369 2 (Cotton_A_27764) vs. 1 (Gorai.003G1285) 0.004 0.040 0.097

370 2 (Cotton_A_02937) vs. 1 (Gorai.011G0486) 0.004 0.041 0.096

371 2 (Cotton_A_29102) vs. 1 (Gorai.008G0687) 0.004 0.038 0.091

372 2 (Cotton_A_19432) vs. 1 (Gorai.001G2565) 0.004 0.050 0.084

373a 2 (Cotton_A_13069) vs. 1 (Gorai.006G1901) 0.003 0.043 0.063

374 2 (Cotton_A_11956) vs. 1 (Gorai.004G0129) 0.002 0.035 0.048

375a 2 (Cotton_A_28832) vs. 1 (Gorai.002G1171) 0.002 0.047 0.040

376b 2 (Cotton_A_34882) vs. 1 (Gorai.010G0325) 0.002 0.045 0.034

377 2 (Cotton_A_36582) vs. 1 (Gorai.009G3548) 0.000 0.021 0.000

Note:

a A PPR candidate sequence derived from A2 genome existed in this pair of homologous sequences;

b A PPR candidate sequence derived from D5 genome existed in this pair of homologous sequences;

c This pair of homologous sequences owned the maximum value of dN/dS(shown in bold and underline fonts);

d This pair of homologous sequences owned the maximum value of dN and dS(shown in bold and underline fonts)

Additional file 4: Figure S3. GO annotation of PPR homologous genes between G.

raimondii (D5) and G. arboreum (A2) genomes. (a) GO bar chart of D5-A2 PPR homologies in

secondary level GO terms of 3 main GO categories (biological process, cellular component

and molecular function). Input list, D5-A2 PPR homologous sequences. Background/reference,

cotton genome locus (phytozome). (b) GO hieratical graph of D5-A2 PPR homologies for

biological process. The more significant statistically, the darker the note color was. (c) GO

hieratical graph of D5-A2 PPR homologies for cellular component. The more significant

statistically, the darker the note color was.

29

Additional file 5: Table S4. Sub-family analysis of 70 PPR candidate genes in G. raimondii

(D5), G. arboreum (A2) and G. hirsutum (AD1) genomes. Table S5. Sub-family analysis of 8

PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum (AD1) genomes. Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum

(AD1) genomes

Chromos

ome Genea

Subfam

ily

No. of

amino acids

No. of

Motif Motif arrangementb

Gossypium raimondii

Chr01 Gorai.001G1

316 P 419 12 8-P-P-P-P-P-P-P-P-P-P-P-P-6

Chr02 Gorai.002G0

718(1) P 787 18

63-P-22-P-32-P-35-P-P-P-P-P-P-P-P-P-P-P-4-P-P-P-

P-13

Chr02 Gorai.002G0

718(2) P 787 18

63-P-22-P-32-P-35-P-P-P-P-P-P-P-P-P-P-P-4-P-P-P-

P-13

Chr02 Gorai.002G1

010 P 1063 25

84-P-P-P-P-P-P-P-P-3-P-P-P-P-P-P-6-P-P-1-P-3-P-P-

P-P-39-P-5-P-P-P-68

Chr03 Gorai.003G1

716 P 598 13 115-P-P-P-P-P-P-S-5-P-P-P-1-P-P-P-28

Chr04 Gorai.004G2

406 P 509 12 69-P-P-P-P-3-P-P-P-P-P-P-P-P-21

Chr04 Gorai.004G2

438 P 647 15 101-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21

Chr04 Gorai.004G2

907 P 584 11 19-P-35-P-P-3-P-P-P-P-P-P-P-P-144

Chr06 Gorai.006G2

252 P 508 4 144-P-P-1-P-P-227

Chr06 Gorai.006G2

471 P 638 14 106-P-P-P-P-P-P-P-S-4-P-P-3-P-3-P-P-P-41

Chr07 Gorai.007G1

431 P 366 9 30-P-P-P-P-P-P-2-P-P-P-35

Chr08 Gorai.008G0 P 631 13 152-P-1-P-P-P-P-P-P-P-P-P-P-P-P-25

30

443

Chr09 Gorai.009G0

058 P 716 15 167-P-P-P-P-P-2-P-P-P-P-P-P-P-P-P-P-27

Chr09 Gorai.009G1

519 P 692 14 141-P-P-P-P-P-P-3-P-P-P-3-P-P-P-P-33-P-35

Chr09 Gorai.009G2

580 P 632 12 86-P-43-P-P-P-P-P-P-P-P-P-P-5-P-87

Chr09 Gorai.009G3

762 P 435 6 166-P-P-8-P-P-P-P

Chr10 Gorai.010G0

325(1) P 595 14 105-P-P-P-2-P-P-P-P-P-P-P-P-P-P-P-11

Chr10 Gorai.010G0

325(2) P 554 14 64-P-P-P-2-P-P-P-P-P-P-P-P-P-P-P-11

Chr10 Gorai.010G0

325(3) P 595 14 105-P-P-P-2-P-P-P-P-P-P-P-P-P-P-P-11

Chr10 Gorai.010G0

536(1) P 722 14 117-P-36-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46

Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G.

hirsutum (AD1) genomes (continued)

Chromos

ome Genea

Subfam

ily

No. of

amino acids

No. of

Motif Motif arrangementb

Chr10 Gorai.010G0

536(2) P 646 14 41-P-37-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46

Chr10 Gorai.010G0

722(1) P 632 10 110-P-P-P-4-P-P-P-5-P-P-P-P-136

Chr10 Gorai.010G0

722(2) P 632 10 110-P-P-P-4-P-P-P-5-P-P-P-P-136

Chr10 Gorai.010G2

281(1) P 638 14 122-P-5-P-P-1-P-P-P-S-4-P-P-1-P-2-P-P-P-P-24

Chr10 Gorai.010G2

281(2) P 638 14 122-P-5-P-P-1-P-P-P-S-4-P-P-1-P-2-P-P-P-P-24

Chr11 Gorai.011G1

450(1) P 586 15 68-P-P-P-P-P-P-P-P-P-P-P-P-P-P-2-P

Chr11 Gorai.011G1

451(1) P 558 14 47-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21

Chr11 Gorai.011G1

464(1) P 519 13 43-P-P-P-P-P-P-P-P-1-P-P-P-P-P-21

Chr11 Gorai.011G1

464(2) P 367 10 9-P-P-P-P-P-1-P-P-P-P-P-8

Chr11 Gorai.011G1

465 P 93 3 8-P-P-P

31

Chr11 Gorai.011G1

466 P 371 8 99-P-P-P-P-P-P-P-P-2

Chr11 Gorai.011G1

511 P 73 1 25-P-13

Chr11 Gorai.011G1

512 P 626 15 54-P-35-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21

Chr11 Gorai.011G1

514 P 442 10 113-P-P-P-P-P-P-P-P-P-P

Chr11 Gorai.011G1

515 P 824 20

65-P-P-P-P-P-P-P-P-2-P-P-P-P-2-P-1-P-63-P-P-P-P-

P-P-1

Chr11 Gorai.011G1

557 P 536 13 67-P-1-P-P-P-P-P-P-P-19-P-P-P-2-P-P-17

Chr12 Gorai.012G0

303(1) P 524 10 136-P-P-P-1-P-P-P-P-P-17-P-P-20

Chr12 Gorai.012G0

303(2) P 431 9 44-P-62-P-P-P-P-P-P-P-P-14

Chr12 Gorai.012G1

205(1) P 763 14 193-P-P-P-P-P-P-P-P-P-P-P-P-P-P-84

Chr12 Gorai.012G1

205(2) P 763 14 193-P-P-P-P-P-P-P-P-P-P-P-P-P-P-84

Chr12 Gorai.012G1

494(1) P 536 9 168-P-P-P-P-P-P-P-P-1-P-55

Chr12 Gorai.012G1

494(2) P 384 9 16-P-P-P-P-P-P-P-P-1-P-55

Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G.

hirsutum (AD1) genomes (continued)

Chromos

ome Genea

Subfam

ily

No. of

amino acids

No. of

Motif Motif arrangementb

Chr12 Gorai.012G1

593(1) P 868 18

72-S-6-P-P-P-P-P-P-P-P-P-P-P-25-P-P-P-P-P-45-P-1

06

Chr12 Gorai.012G1

593(2) P 868 18

72-S-6-P-P-P-P-P-P-P-P-P-P-P-25-P-P-P-P-P-45-P-1

06

Chr12 Gorai.012G1

593(3) P 868 18

72-S-6-P-P-P-P-P-P-P-P-P-P-P-25-P-P-P-P-P-45-P-1

06

Chr13 Gorai.013G0

109(1) P 960 15 316-P-P-P-2-P-1-P-P-P-P-P-P-P-P-P-P-P-123

Chr13 Gorai.013G0

109(2) P 755 12 316-P-P-P-2-P-1-P-P-P-P-P-P-P-P-23

Chr13 Gorai.013G0

606(1) P 555 13 43-P-P-35-P-P-P-P-P-P-P-P-2-P-P-P-22

Chr13 Gorai.013G0 P 416 11 9-P-P-P-P-P-P-P-P-2-P-P-P-22

32

606(2)

Gossypium arboreum

Chr01 Cotton_A_32

157 P 195 5 9-P-P-P-P-P-13

Chr03 Cotton_A_26

557 P 440 11 8-P-4-P-P-P-P-P-P-P-2-P-P-P-45

Chr03 Cotton_A_03

817 P 523 9 153-S-4-P-3-P-P-4-P-P-2-P-35-P-P-24

Chr04 Cotton_A_06

850 P 704 7 368-P-S-5-P-35-P-P-P-P-21

Chr05 Cotton_A_26

837 P 415 11 8-P-P-P-P-P-P-P-P-P-P-P-22

Chr06 Cotton_A_04

606 P 566 8 169-P-P-34-P-P-P-P-35-P-P-48

Chr06 Cotton_A_29

300 P 587 14 54-P-35-P-P-1-P-P-P-P-P-P-P-P-P-P-P-21

Chr07 Cotton_A_16

847 P 573 13 103-P-P-P-P-P-3-P-P-P-P-P-P-P-P-12

Chr07 Cotton_A_18

522 P 622 14 111-P-P-P-P-P-P-P-P-P-P-P-P-P-P-21

Chr08 Cotton_A_02

931 P 734 14 173-P-P-P-P-3-P-P-P-P-P-3-P-P-P-P-36-P-39

Chr09 Cotton_A_27

681 P 759 19 38-P-9-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-4-P-P-P-54

Chr10 Cotton_A_13

069 P 466 8 130-S-3-P-P-5-P-P-P-P-P-54

Chr10 Cotton_A_17

619 P 398 6 175-P-P-P-P-P-P-16

Table S4 Sub-family analysis of 70 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G.

hirsutum (AD1) genomes (continued)

Chromos

ome Genea

Subfam

ily

No. of

amino acids No. of Motif Motif arrangementb

Chr10 Cotton_A_14

743 P 544 14 8-P-35-P-P-P-P-P-P-P-P-P-P-P-P-P-21

Chr12 Cotton_A_37

656 P 1525 18

239-P-37-P-P-P-P-P-P-P-72-P-36-P-1-S-41-P-41-

P-P-35-P-1-P-P-35-P-376

Chr12 Cotton_A_28

832 P 706 14 165-P-3-P-41-P-1-P-1-P-P-P-P-P-P-P-P-P-P-21

Chr12 Cotton_A_00

514 P 817 20

108-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-1-P-1

-P-12

Gossypium hirsutum

33

GhBah0036h0

9 P 480 8 104-S-42-P-41-P-P-P-P-P-1-P-24

GhDeg5330 P 471 9 90-S-5-S-4-P-P-4-P-P-P-P-P-66

GhI12 P 458 7 118-P-36-P-P-P-P-P-4-P-59

GhK14 P 846 16 273-P-P-P-P-P-P-3-P-P-P-P-P-3-P-P-P-P-P-13

GhPPR3 P 547 10 155-P-2-P-1-P-P-P-P-1-P-P-5-P-1-P-42

GhPPR4 P 337 4 175-P-P-P-P-22

GhPPR5 P 288 7 11-P-P-2-P-P-2-P-P-5-P-29

GhPPRH1 P 638 17 P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-1-P-47

GhPPRH2 P 851 16 237-P-P-P-P-P-P-P-P-P-P-P-35-P-P-P-P-P-21

GhMX55E05 E 522 12 87-L-7-S-3-P-L-S-P-L-S-P-L2-S-9-E

GhPPR1 E 532 11 73-P-L-S-S-P-4-L-S-P-L2-S-4-E-46

GhCRR4 E+ 637 15

85-P-L-S-S-2-S-S-S-P-4-L-2-S-P-L2-1-S-4-E-E

+-12

Gh155c17

D

Y

W

875 21 65-P-2-L-2-S-P-L-S-P-L-1-S-P-L-4-S-P-2-L-S-

P-L2-S-5-E-E+-DYW

GhMX089E0

3

D

Y

W

775 19 32-S-P-L-S-P-L-S-P-L-S-P-3-L-1-S-P-L2-S-4-E

-E+-DYW

GhPPR2

D

Y

W

592 11 116-L-1-S-P-L-S-P-L2-S-7-E-E+-DYW

Note: aThe two or more transcipts of the same gene were distinguished by different numbers; bThe number in motif arrangement

represented the number of amino acids between two adjacent motifs.

Table S5 Sub-family analysis of 8 PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum

(AD1) genomes

Gene Subfamily No.

of

amino

acids

Motif

number

Motif arrangement

Gorai.006G2471 P 638 14 106-P-P-P-P-P-P-P-S-4-P-P-3-P-3-P-P-P-41

Gorai.005G0470 P 641 14 124-P-P-P-P-P-P-P-P-P-3-P-1-P-P-P-P-29

Cotton_A_08373 P 643 14 124-P-P-P-P-P-P-P-P-P-P-1-P-P-P-P-29

Cotton_A_26557 P 440 11 8-P-4-P-P-P-P-P-P-P-2-P-P-P-45

Gorai.007G1431 P 366 9 30-P-P-P-P-P-P-2-P-P-P-35

PhRf_PPR592 P 592 13 44-P-70-P-3-P-P-P-P-P-P-P-P-P-P-P-33

OsRf1a P 791 17 87-P-P-P-6-P-P-P-P-P-P-P-1-P-P-P-P-P-2-P-4-P-107

34

OsRf1b P 506 12 28-P-37-P-1-P-P-2-P-P-P-P-P-P-P-P-30

GhPPR3 P 547 10 155-P-2-P-1-P-P-P-P-1-P-P-5-P-1-P-42

GhK14 P 846 16 273-P-P-P-P-P-P-3-P-P-P-P-P-3-P-P-P-P-P-13

Gorai.010G0536(1) P 646 14 41-P-37-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46

Gorai.010G0536(2) P 722 14 117-P-36-S-4-P-P-P-P-P-1-P-P-41-P-P-P-P-P-46

BnPPR_B_L1 P 667 15 109-P-P-1-P-P-4-P-P-P-P-P-P-P-11-P-P-P-P-25

RsRfo(PPR_B) P 687 16 51-P-35-P-P-P-P-3-P-P-P-P-P-P-P-13-P-P-P-P-38

AtRPF1 P 602 13 121-P-P-P-P-P-P-P-P-P-P-P-P-P-26

Additional file 6: Table S6. List of primers used for qRT-PCR.

Table S6 List of primers used for qRT-PCR

Gene Sequence (5' to 3') Tm (℃) Length (bp)

Gorai. 005g0470 F: TGGTCAGTCTCCAGCGTTATCTACA 62.0 25

R: GTATGCTGAAATGCTCAATGCTCG 60.3 24

Gorai. 006g2471 F: GAGCCTGATTACGCTACTCTTGG 62.0 23

R: AAAACATCACCTTGAAACCCTCTT 56.8 24

Gorai. 007g1431 F: GAGAAGTTGGAAGAAGCGAATCAGTT 60.4 26

R: CTTACCAGCCAAGCAATACCCATC 62.0 24

Gorai. 010g0536 F: CATTGATGGGAAACCAACCGTG 60.1 22

R: GTGGATGCAACTGGTGGAGGAC 63.8 22

Cotton_A_26557 F: AGGCAGGAAAGGTTGACGAAGC 61.9 22

R: CCAGTGCCTCTGAGTCACAATCG 63.7 23

Cotton_A_08373 F: TTCCAAGAAGGGCAAGTGAGC 60.0 21

R: ATCAAAAGCCTCCTCAATGTGG 58.2 22

GhPPR3 F: TTTGTTGAGGTTAGACGAGGTTTAC 58.7 25

R: TCATACTTCTTCGCCTTACAATACG 58.7 25

GhK14 F: TCTCTCCTAACAATCCTCCTACCGT 62.0 25

R: GACATCAATAGCGTAAGTAAAACCCAC 60.5 27

UBQ7 F: GAAGGCATTCCACCTGACCAAC 61.9 22

R: CTTGACCTTCTTCTTCTTGTGCTTG 60.3 25

35

Acknowledgments

We are indebted to Dr. Anming Ding (Tobacco Research Institute, Chinese Academy of Agricultural

Sciences, Qingdao, China) for supplying HMMER matrix of PPR gene family in Arabidopsis (defined

by Prof. Small Ian). We thank Dr. Zhen Su (State Key Laboratory of Plant Physiology and

Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, China) for

helpful advises and discussion. This work was supported by the National Natural Science Foundation

of China (31671741) and National Key R & D Program for Crop Breeding (2016YFD0100203) to J

HUA.

36

References

Akagi H., Nakamura A., Yokozeki-Misono Y., Inagaki A., Takahashi H., Mori K. et al. 2004 Positional

cloning of the rice Rf-1 gene, a restorer of BT-type cytoplasmic male sterility that encodes a

mitochondria-targeting PPR protein. Theor. Appl. Genet. 108, 1449-1457.

Altschul S. F., Gish W., Miller W., Myers E. W. and Lipman D. J. 1990 Basic local alignment search

tool. J. Mol. Biol. 215, 403-410.

Aubourg S., Boudet N., Kreis M. and Lecharny A. 2000 In Arabidopsis thaliana, 1% of the genome

codes for a novel protein family unique to plants. Plant Mol. Biol. 42, 603-613.

Barkan A. and Small I. 2014 Pentatricopeptide repeat proteins in plants. Annu. Rev. Plant Biol. 65,

415-442.

Bentolila S., Alfonso A. A. and Hanson M. R. 2002 A pentatricopeptide repeat-containing gene restores

fertility to cytoplasmic male-sterile plants. Proc. Natl. Acad. Sci. U S A 99, 10887-10892.

Brown G. G., Formanova N., Jin H., Wargachuk R., Dendy C., Patil P. et al. 2003 The radish Rfo

restorer gene of Ogura cytoplasmic male sterility encodes a protein with multiple

pentatricopeptide repeats. Plant J. 35, 262-272.

Carlsson J., Leino M., Sohlberg J., Sundstrom J. F. and Glimelius K. 2008 Mitochondrial regulation of

flower development. Mitochondrion 8, 74-86.

Chen, Z., Feng, K., Grover, C.E., Li, P., Liu, F., Wang, Y., et al. 2016 Chloroplast DNA structural

variation, phylogeny, and age of divergence among diploid cotton species. PLoS ONE 11,

e0157183.

Chen, Z., Grover, C.E., Li, P., Wang, Y., Nie, H., Zhao, Y., et al. 2017a Molecular evolution of the

plastid genome during diversification of the cotton genus, Mol. Phylogenet. Evol.112, 268-278.

Chen, Z., Nie, H., Grover, C.E., Wang, Y., Li, P., Wang, M. et al. 2017b Entire nucleotide sequences

of Gossypium raimondii and G. arboreum mitochondrial genomes revealed A-genome species as

cytoplasmic donor of the allotetraploid species. Plant Biol. 19, 484-493.

Chen Z, Zhao N, Li S, Grover CE, Nie H, Wendel JF, et al. 2017c Plant mitochondrial genome

evolution and cytoplasmic male sterility. Crit Rev Plant Sci. 36, 55–69.

Cui X., Wise R. P. and Schnable P. S. 1996 The rf2 nuclear restorer gene of male-sterile T-cytoplasm

maize. Science 272, 1334-1336.

Cushing D. A., Forsthoefel N. R., Gestaut D. R. and Vernon D. M. 2005 Arabidopsis emb175 and other

ppr knockout mutants reveal essential roles for pentatricopeptide repeat (PPR) proteins in plant

embryogenesis. Planta 221, 424-436.

Desloire S., Gherbi H., Laloui W., Marhadour S., Clouet V., Cattolico L. et al. 2003 Identification of

the fertility restoration locus, Rfo, in radish, as a member of the pentatricopeptide-repeat protein

family. EMBO Rep. 4, 588-594.

Dewey R. E., Timothy D. H. and Levings C. S. 1987 A mitochondrial protein associated with

cytoplasmic male sterility in the T cytoplasm of maize. Proc. Natl. Acad. Sci. U S A 84,

5374-5378.

Feng C. D., Stewart J. M. and Zhang J. F. 2005 STS markers linked to the Rf1 fertility restorer gene of

cotton. Theor. Appl. Genet. 110, 237-243.

37

Fujii S., Bond C. S. and Small I. D. 2011 Selection patterns on restorer-like genes reveal a conflict

between nuclear and mitochondrial genomes throughout angiosperm evolution. Proc. Natl. Acad.

Sci. U S A 108, 1723-1728.

Fujii S. and Toriyama K. 2009 Suppressed expression of retrograde-regulated male sterility restores

pollen fertility in cytoplasmic male sterile rice plants. Proc. Natl. Acad. Sci. U S A 106,

9513-9518.

Gallagher J. P., Grover C. E., Rex K., Moran M. and Wendel J. F. 2017 A new species of cotton from

Wake Atoll, Gossypium stephensii (Malvaceae). Syst. Bot. 42, 115-123.

Galtier N. 2011 The intriguing evolutionary dynamics of plant mitochondrial DNA. BMC Biol. 9, 61.

Geddy R. and Brown G. G. 2007 Genes encoding pentatricopeptide repeat (PPR) proteins are not

conserved in location in plant genomes and may be subject to diversifying selection. BMC

Genomics 8, 130.

Germain A., Hotto A. M., Barkan A. and Stern D. B. 2013 RNA processing and decay in plastids. Wiley

Interdiscip. Rev. RNA 4, 295-316.

Giancola S., Marhadour S., Desloire S., Clouet V., Falentin-Guyomarc'h H., Laloui W. et al. 2003

Characterization of a radish introgression carrying the Ogura fertility restorer gene Rfo in

rapeseed, using the Arabidopsis genome sequence and radish genetic mapping. Theor. Appl.

Genet. 107, 1442-1451.

Gillman J. D., Bentolila S. and Hanson M. R. 2007 The petunia restorer of fertility protein is part of a

large mitochondrial complex that interacts with transcripts of the CMS-associated locus. Plant J.

49, 217-227.

Hashimoto M., Endo T., Peltier G., Tasaka M. and Shikanai T. 2003 A nucleus-encoded factor, CRR2, is

essential for the expression of chloroplast ndhB in Arabidopsis. Plant J. 36, 541-549.

Howell M. D., Fahlgren N., Chapman E. J., Cumbie J. S., Sullivan C. M., Givan S. A. et al. 2007

Genome-wide analysis of the RNA-denpendent RNA polymerase6/DICER-like4 pathway in

Arabidopsis reveals dependency on miRNA- and tasiRNA-directed targeting. Plant Cell 19,

926-942.

Hu J., Wang K., Huang W., Liu G., Gao Y., Wang J. et al. 2012 The rice pentatricopeptide repeat protein

RF5 restores fertility in Hong-Lian cytoplasmic male-sterile lines via a complex with the

glycine-rich protein GRP162. Plant Cell 24, 109-122.

Itabashi E., Iwata N., Fujii S., Kazama T. and Toriyama K. 2011 The fertility restorer gene, Rf2, for

lead rice-type cytoplasmic male sterility of rice encodes a mitochondrial glycine-rich protein.

Plant J. 65, 359-367.

Janska H., Sarria R., Woloszynska M., Arrieta-Montiel M. and Mackenzie S. A. 1998 Stoichiometric

shifts in the common bean mitochondrial genome leading to male sterility and spontaneous

reversion to fertility. Plant Cell 10, 1163-1180.

Jore M. M., Lundgren M., van Duijn E., Bultema J. B., Westra E. R., Waghmare S. P. et al. 2011

Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nat. Struct. Mol. Biol. 18,

529-536.

Katoh K. and Standley D. M. 2013 MAFFT multiple sequence alignment software version 7:

improvements in performance and usability. Mol. Biol. Evol. 30, 772-780.

Kazama T. and Toriyama K. 2003 A pentatricopeptide repeat-containing gene that promotes the

processing of aberrant atp6 RNA of cytoplasmic male-sterile rice. FEBS Lett. 544, 99-02.

38

Klein R. R., Klein P. E., Mullet J. E., Minx P., Rooney W. L. and Schertz K. F. 2005 Fertility restorer

locus Rf1 of sorghum (Sorghum bicolor L.) encodes a pentatricopeptide repeat protein not present

in the colinear region of rice chromosome 12. Theor. Appl. Genet. 111, 994-1012.

Koizuka N., Imai R., Fujimoto H., Hayakawa T., Kimura Y., Kohno-Murase J. et al. 2003 Genetic

characterization of a pentatricopeptide repeat protein gene, orf687, that restores fertility in the

cytoplasmic male-sterile Kosena radish. Plant J. 34, 407-415.

Komori T., Ohta S., Murai N., Takakura Y., Kuraya Y., Suzuki S. et al. 2004 Map-based cloning of a

fertility restorer gene, Rf-1, in rice (Oryza sativa L.). Plant J. 37, 315-325.

Lei B., Li S., Liu G., Chen Z., Su A., Li P. et al. 2013 Evolution of mitochondrial gene content: loss of

genes, tRNAs and introns between Gossypium harknessii and other plants. Plant Syst. Evol. 299,

1889-1897.

Li F., Fan G., Wang K., Sun F., Yuan Y., Song G. et al. 2014 Genome sequence of the cultivated cotton

Gossypium arboreum. Nat. Genet. 46, 567-572.

Li S., Liu G., Chen Z., Wang Y., Li P., Hua J. 2013. Construction and initial analysis of five Fosmid

libraries of mitochondrial genomes of cotton (Gossypium). Chinese Sci. Bull. 58, 4608-4615.

Li P., Cao M., Yang L., Xu A. and Liu H. 2007 Mapping of fertility restorer gene for cotton cytoplasmic

male sterile line Jin A. Acta Bot. Bor-Occid. Sin. 27 1937-1942.

Liu F., Cui X., Horner H. T., Weiner H. and Schnable P. S. 2001 Mitochondrial aldehyde

dehydrogenase activity is required for male fertility in maize. Plant Cell 13, 1063-1078.

Liu L., Guo W., Zhu X. and Zhang T. 2003 Inheritance and fine mapping of fertility restoration for

cytoplasmic male sterility in Gossypium hirsutum L. Theor. Appl. Genet. 106, 461-469.

Luo D., Xu H., Liu Z., Guo J., Li H., Chen L. et al. 2013 A detrimental mitochondrial-nuclear

interaction causes cytoplasmic male sterility in rice. Nat. Genet. 45, 573-577.

Lurin C., Andres C., Aubourg S., Bellaoui M., Bitton F., Bruyere C. et al. 2004 Genome-wide analysis

of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle

biogenesis. Plant Cell 16, 2089-2103.

Mackenzie S. A. and Chase C. D. 1990 Fertility restoration is associated with loss of a portion of the

mitochondrial genome in cytoplasmic male-sterile common bean. Plant Cell 2, 905-912.

Matsuhira H., Kagami H., Kurata M., Kitazaki K., Matsunaga M., Hamaguchi Y. et al. 2012 Unusual

and typical features of a novel restorer-of-fertility gene of sugar beet (Beta vulgaris L.). Genetics

192, 1347-1358.

Meierhoff K., Felder S., Nakamura T., Bechtold N. and Schuster G. 2003 HCF152, an Arabidopsis

RNA binding pentatricopeptide repeat protein involved in the processing of chloroplast

psbB-psbT-psbH-petB-petD RNAs. Plant Cell 15, 1480-1495.

Melonek J., Stone J. D. and Small I. 2016 Evolutionary plasticity of restorer-of-fertility-like proteins in

rice. Sci. Rep. UK 6, 35152.

Meyer V. G. 1975 Male sterility from Gossypium harknessii. J. Heredity 66, 23-27.

Mistry J., Finn R. D., Eddy S. R. Bateman A. and Punta M. 2013 Challenges in homology search:

HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 41, e121.

Nakamura T., Meierhoff K., Westhoff P. and Schuster G. 2003 RNA-binding properties of HCF152, an

Arabidopsis PPR protein involved in the processing of chloroplast RNA. Eur. J. Biochem. 270,

4070-4081.

O'Toole N., Hattori M., Andres C., Iida K., Lurin C., Schmitz-Linneweber C. et al. 2008 On the

expansion of the pentatricopeptide repeat gene family in plants. Mol. Biol. Evol. 25, 1120-1128.

39

Schnable P. S. and Wise R. P. 1998 The molecular basis of cytoplasmic male sterility and fertility

restoration. Trends Plant Sci. 3, 175-180.

Small I. D. and Peeters N. 2000 The PPR motif - a TPR-related motif prevalent in plant organellar

proteins. Trends Biochem. Sci. 25, 46-47.

Suzuki H., Yu J., Ness S. A., O'Connell M. A. and Zhang J. 2013 RNA editing events in mitochondrial

genes by ultra-deep sequencing methods: a comparison of cytoplasmic male sterile, fertile and

restored genotypes in cotton. Mol. Genet. Genomics 288, 445-457.

Sykes T., Yates S., Nagy I., Asp T., Small I. and Studer B. 2017 In silico identification of candidate

genes for fertility restoration in cytoplasmic male sterile perennial ryegrass (Lolium perenne L.).

Genome Biol. Evol. 9, 351-362.

Tamura K., Peterson D., Peterson N., Stecher G., Nei M. and Kumar S. 2011 MEGA5: molecular

evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum

parsimony methods. Mol. Biol. Evol. 28, 2731-2739.

Wang F., Yue B., Hu J. G., Stewart J. M. and Zhang J. F. 2009 A target region amplified polymorphism

marker for fertility restorer gene Rf1 and chromosomal localization of Rf1 and Rf2 in cotton. Crop

Sci. 49, 1602-1608.

Wang Z., Zou Y., Li X., Zhang Q., Chen L., Wu H. et al. 2006 Cytoplasmic male sterility of rice with

boro II cytoplasm is caused by a cytotoxic peptide and is restored by two related PPR motif genes

via distinct modes of mRNA silencing. Plant Cell 18, 676-687.

Wang Z. W., De Wang C., Gao L., Mei S. Y., Zhou Y., Xiang C. P. et al. 2013 Heterozygous alleles

restore male fertility to cytoplasmic male-sterile radish (Raphanus sativus L.): a case of

overdominance. J. Exp. Bot. 64, 2041-2048.

Wendel J. F. and Grover C. E. 2015 Taxonomy and evolution of the cotton genus. In: Fang D and Percy

R, editors. Cotton. American Society of Agronomy, Inc., Crop Science Society of America, Inc.,

and Soil Science Society of America, Inc., Madison, WI, pp. 25-44

Wu J. Y., Cao X. X., Guo L. P., Qi T. X., Wang H. L., Tang H. N. et al. 2014 Development of a

candidate gene marker for Rf1 based on a PPR gene in cytoplasmic male sterile CMS-D2 upland

cotton. Mol. Breeding 34, 231-240.

Xia X. and Xie Z. 2001 DAMBE: Software package for data analysis in molecular biology and

evolution. J. Heredity 92, 371-373.

Yang Z. and Nielsen R. 2000 Estimating synonymous and nonsynonymous substitution rates under

realistic evolutionary models. Mol. Biol. Evol. 17, 32-43.

Yin J., Guo W., Yang L., Liu L. and Zhang T. 2006 Physical mapping of the Rf1 fertility-restoring gene

to a 100 kb region in cotton. Theor. Appl. Genet. 112, 1318-1325.

Zabala G., Gabay-Laughnan S. and Laughnan J. R. 1997 The nuclear gene Rf3 affects the expression of

the mitochondrial chimeric sequence R implicated in S-type male sterility in maize. Genetics 147,

847-860.

Zhang J. F. and Stewart J. M. 2004 Identification of molecular markers linked to the fertility restorer

genes for CMS-D8 in cotton. Crop Sci. 44, 1209-1217.

Zhang T., Hu Y., Jiang W., Fang L., Guan X. and Chen J. 2015 Sequencing of allotetraploid cotton

(Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol.

33, 531-537.

40

Zhang X., Wang L., Xu X., Cai C. and Guo W. 2014 Genome-wide identification of mitogen-activated

protein kinase gene family in Gossypium raimondii and the function of their corresponding

orthologs in tetraploid cultivated cotton. BMC Plant Biol. 14, 345.

Zhao L., Yuanda L., Caiping C., Xiangchao T., Xiangdong C., Wei Z. et al. 2012 Toward allotetraploid

cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA

sequence information. BMC Genomics 13, 539.

Table captions

Table 1 Identification of PPR gene family in D5 and A2 genomes of Gossypiuma

Chromosomeb

G. raimondii G. arboreum

No. of PPR locus No. of PPR motif No. of PPR locus No. of PPR motif

chr01 35 210 32 149

chr02 30 229 18 218

chr03 20 195 26 154

chr04 29 241 33 172

chr05 40 306 27 157

chr06 37 299 49 225

chr07 45 244 25 148

chr08 48 271 33 218

chr09 75 799 34 191

chr10 32 235 31 155

chr11 32 349 42 231

chr12 26 221 39 205

chr13 33 244 30 144

Total No.c 482 3843 433d 2367

Note:

a The number of PPR genes in D5 and A2 genomes was “clean” data suffered from two filter processing.

b The chromosome and the number labeled with a double underline represented the location of the maximum number of PPR locus and the

maximum. While the chromosome and the number labeled with a single underline indicated the location of the minimum number of

PPR locus and the minimum.

c The total numbers of PPR genes in D5 and A2 genomes were marked only in bold.

d There were 14 PPR loci identified on large scaffolds.

41

Table 2 Expression analysis of PPR candidate genes in G. raimondii (D5), G. arboreum (A2) and G. hirsutum

(AD1) genomes

Gene RPKM value-Fold

2074A 2074B AE1

Gorai.010G0536(1) 1.00 1.50 5.71

Gorai.010G0536(2) 1.00 1.63 6.57

Gorai.007G1431 1.00 1.20 21.93

Cotton_A_26557 1.00 1.32 20.84

Gorai.006G2471 1.00 0.16 1.27

Cotton_A_08373 1.00 0.41 12.59

GhPPR3 1.00 0.00 5.22

GhK14 1.00 0.05 4.93

Gorai.005G0470 1.00 0.63 3.46