The pomegranate (Punica granatum L.) genome and the ...

67
Accepted Article This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/tpj.13625 This article is protected by copyright. All rights reserved. PROFESSOR YILIU XU (Orcid ID : 0000-0003-1564-9972) Article type : Resource The pomegranate (Punica granatum L.) genome and the genomics of punicalagin biosynthesis Gaihua Qin 1,2,* , Chunyan Xu 3,* , Ray Ming 4,5* , Haibao Tang 4 , Romain Guyot 6 , Elena M. Kramer 7 , Yudong Hu 3 , Xingkai Yi 1,2 , Yongjie Qi 1,2 , Xiangyang Xu 3 , Zhenghui Gao 1,2 , Haifa Pan 1,2 , Jianbo Jian 3 , Yinping Tian 3 , Zhen Yue 3,# & Yiliu Xu 1,2,# 1 Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei 230031, China; 2 Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Anhui Province, Hefei 230031, China 3 BGI Genomics, BGI-Shenzhen, Shenzhen 518083, China; 4 Fujian Agriculture and Forestry University and University of Illinois at Urbana-Champaign School of Integrative Biology Joint Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou 350002, China; 5 Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61822, USA

Transcript of The pomegranate (Punica granatum L.) genome and the ...

Page 1: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/tpj.13625

This article is protected by copyright. All rights reserved.

PROFESSOR YILIU XU (Orcid ID : 0000-0003-1564-9972)

Article type : Resource

The pomegranate (Punica granatum L.) genome and the

genomics of punicalagin biosynthesis

Gaihua Qin1,2,*, Chunyan Xu3,*, Ray Ming4,5*, Haibao Tang4, Romain Guyot6, Elena M.

Kramer7, Yudong Hu3, Xingkai Yi1,2, Yongjie Qi1,2, Xiangyang Xu3, Zhenghui Gao1,2, Haifa

Pan1,2, Jianbo Jian3, Yinping Tian3, Zhen Yue3,# & Yiliu Xu1,2,#

1Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei 230031, China;

2Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Anhui

Province, Hefei 230031, China;

3BGI Genomics, BGI-Shenzhen, Shenzhen 518083, China;

4Fujian Agriculture and Forestry University and University of Illinois at Urbana-Champaign

School of Integrative Biology Joint Center for Genomics and Biotechnology, Fujian

Agriculture and Forestry University, Fuzhou 350002, China;

5Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois

61822, USA;

Page 2: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

6Institut de Recherche pour le Développement, Diversité, Adaptation et Développement des

Plantes, Montpellier 34394, France;

7Department of Organismic and Evolutionary Biology, Harvard University, Cambridge MA

02138, USA

* Equal contributors.

# Corresponding authors.

Yiliu Xu

Address: No. 40 Nongkenan Road, Hefei, Anhui Province, PR China

Tel: 0086-551-62160121

Fax: 0086-551-65160937

E-mail:[email protected]

Zhen Yue

Address: No. 21 Hongan 3rd Street, Yantian District, Shenzhen, 518083, PR China

Tel: 0086- 189-2675-8015

Fax: 0086-755-36307125

Page 3: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

E-mail:[email protected]

Running head: Pomegranate genome

Keywords: Pomegranate; Genome; System evolution; Adaption; Punicalagin biosynthesis

SUMMARY

Pomegranate (Punica granatum L.) is a perennial fruit crop grown since ancient times

has been planted worldwide and is known for its functional metabolites, particularly

punicalagins. We have sequenced and assembled the pomegranate genome with 328 Mb

anchored into nine pseudo-chromosomes and annotated 29 229 gene models. A Myrtales

lineage-specific whole-genome duplication (WGD) event was detected that occurred in the

common ancestor before the divergence of pomegranate and Eucalyptus. Repetitive

accounted for 46.1% of the assembled genome. We found that the integument development

gene INNER NO OUTER (INO) was under positive selection and potentially contributed to

development of the fleshy outer layer of the seed coat, an edible part of pomegranate fruit.

genes encoding the enzymes for synthesis and degradation of lignin, hemicelluloses, and

cellulose were also differentially expressed between soft- and hard-seeded varieties,

differences in their accumulation in cultivars differing in seed hardness. Candidate genes for

punicalagin biosynthesis were identified and their expression patterns indicated that gallic

synthesis in tissues could follow different biochemical pathways. The genome sequence of

pomegranate provides a valuable resource for the dissection of many biological and

Page 4: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

biochemical traits and also provides important insights for the acceleration of breeding.

Elucidation of the biochemical pathway(s) involved in punicalagin biosynthesis could assist

breeding efforts to increase production of this bioactive compound.

INTRODUCTION

Pomegranate (P. granatum L., Lythraceae) is a perennial fruit tree that is native from Iran

to the Himalayan Mountains in northern India (Morton and Dowling, 1987). It has been

cultivated since prehistoric times and was domesticated during the Neolithic period (Levin,

2006; Seeram et al., 2006). Pomegranate was introduced to China during the Tang Dynasty

(618–907 AD) via the Silk Road (McColl, 1991; Bretschneider, 1935), and has been

cultivated widely since then. Today, China has become one of the world’s main producers of

pomegranate (http://www.discoverlife.org/mp/20m?kind=Punica+granatum). In addition to

being consumed as fresh fruit, juice, and wine, pomegranate is also used as a traditional

medicine in many countries (Li, 1998; Kim et al., 2002; Sanchez-Lamar et al., 2008).

The potential antioxidant, anticancer, and antiatherosclerotic properties of the phenolic

compounds in pomegranate fruit have been extensively investigated (Longtin, 2003;

Sreekumar et al., 2014). These bioactive phenolic compounds include ellagitannins (ETs),

anthocyanins, flavonols, and flavonoids. Punicalagin (an ET unique to pomegranate), gallagic

acid, ellagic acid (EA), and punicalin are the most abundant antioxidant polyphenolic

compounds in pomegranate (Gil et al., 2000; Tzulker et al., 2007). Ingested EA from

pomegranate juice does not accumulate in the blood (Seeram et al., 2004); thus, EA from

pomegranate juice does not appear to be directly biologically active in vivo. However,

punicalagin is hydrolyzed in the gut to release EA, which is then further processed by the

microflora into urolithins (Cerda et al., 2004; Selma et al., 2014), which have biological

Page 5: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

in organisms (Espin et al., 2013; Ryu et al., 2016). Urolithin A (UA) might prevent

mitochondrial dysfunction with aging to extend lifespan in C. elegans, and to prevent

age-related muscle function decline in mice (Ryu et al., 2016). To date, 69 clinical trials have

been registered with the National Institutes of Health to examine the effects of consumption

pomegranate extracts or juice on a variety of human disorders such as cancer, cardiovascular

diseases, and aging (https://clinicaltrials.gov/ct2/results?term=pomegranate). Although there

have been many studies to identify and characterize pomegranate phenolic compounds

et al., 2011; Nallely et al., 2015), and their nutritional and medical properties (Viuda-Martos

al., 2010; Landete et al., 2011), the absence of genomic data and need for more

information for pomegranate have seriously limited its genetic improvement.

is the first fruit tree in the Myrtales to be sequenced. Its genome will provide an important

reference for genetic improvement to increase the content of punicalagin and other secondary

metabolites.

In addition to its rich array of secondary metabolites, unique morphological features make

pomegranate attractive for studying system evolution and selective adaptation. The word

pomegranate is derived from Old French pome grenate (Modern French grenade) and arose

directly from Medieval Latin pomum granatum, literally ‘apple with many seeds’ (Holland et

al., 2009), although the fruit is actually a berry. This fruit was mentioned in the Bible (Janick,

2007) and Koran (Langley, 2000) and is often associated with fertility. The hundreds of seeds

in each fruit are derived from an equal number of ovules distributed among ~5–7 carpels,

which makes pomegranate an interesting subject for the study of reproductive development

and evolutionary biology. The edible fleshy outer seed coat that encapsulates the fibrous

inner seed coat is unusual among fruits and has important evolutionary and adaptive

significance. Here, we report a high-quality draft of the pomegranate genome, which not only

valuable resources for dissection of many biological and metabolic traits, but also provides a

Page 6: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

powerful tool to accelerate breeding to increase production of its bioactive compounds,

particularly punicalagin.

RESULTS

Genome assembly and annotation

The genome of an elite pomegranate cultivar ‘Dabenzi’ (2n = 2x = 18) (Figure S1),

commonly grown in Anhui Province, China, was sequenced at 100× coverage using Illumina

paired-end reads of libraries with insert sizes ranging from 170 bp to 40 kb (Table S1). The

final genome assembly is 328.38 Mb, or ~92% of the estimated total genome size based on

17-mer analysis (356.98 Mb) (Figure S2), close to the estimates obtained by flow cytometry

(Figure S3). The N50 contig length is 66.97 kb and the N50 scaffold length is 1.89 Mb (Table

S2), which is a satisfactory result from assembling only whole-genome shotgun reads. When

the quality of the genome was assessed using assembled unigenes from the transcriptome,

more than 97.37 % of 105 472 unigenes could be aligned to the genome, which indicated

extensive genomic coverage (Table S3).

To construct pseudo-chromosomes, we sequenced 195 F1 individuals from the cross

between ‘Dabenzi’ and ‘Baiyushizi’, constructed a genetic map for ‘Dabenzi’ spanning

1018.67 cM with an average of 232.45 kb/cM, and established nine linkage groups

corresponding to the haploid chromosome number of pomegranate (Figure S4 and Table S4).

A total of 310 scaffolds (>2 Kb) were anchored to the nine linkage groups with 2421 SNP

markers, corresponding to 70.32% of the reference genome. Among these, 140 scaffolds,

corresponding to 62.8 % of the reference genome (Table S5), were oriented.

Page 7: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

A combined structure- and homology-based analysis identified 155.3 Mb of repetitive

sequences that account for 46.1% of the pomegranate assembly. Analysis of these repetitive

sequences indicated that retrotransposons comprised 40.5% of the pomegranate genome and

that only 2.1% of these repeats were uncharacterized. Long terminal repeat (LTR)

retrotransposons comprise 18.9% of the genome (Gypsy 9.8% and Copia 4.8%), and

non-autonomous LTR retrotransposons comprise 19.8% of the genome (Table S6).

Somewhat lower percentages of non-autonomous LTR retrotransposons have been observed

in the coffee (10%) (Denoeud et al., 2014) and Eucalyptus (15%) (Myburg et al., 2014)

genomes. By estimating the insertion time of identified LTR retrotransposons based on the

divergence of LTR sequences (Sanmiguel et al., 1998), we found relatively lower divergence

of Copia than Gypsy elements, suggesting a recent or ongoing wave of retrotransposition for

Copia elements (Figure S5). Considering that Copia elements had inserted in both gene-rich

and gene-poor regions (Figure 1), their activities represent a potential source of genetic and

epigenetic variation (Lisch, 2013).

To facilitate gene annotation, we conducted12 transcriptomes of six different organs or

tissues, including root, leaf, flower, pericarp, and the inner and outer seed coats. Next, we

created protein-coding gene models by combining de novo and homology-based prediction

with RNA-Seq data to train the prediction program. In total, 29,229 protein-coding gene

models with an average gene length of 2574.61 bp were annotated (Table S7), excluding

sequences similar to known TEs. Functional annotations for 62.52% of these predicted genes

were found in the SwissProt database (http://www.uniprot.org/) (Table S8).

Genome evolution

Page 8: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

The 29 229 annotated gene models were clustered by sequence similarity and yielded 13

640 gene families comprised of at least two genes. A total of 6230 orphan pomegranate genes

were identified that could not be clustered with any genes from P. granatum, A. thaliana,

Carica papaya, Eucalyptus grandis, Malus domestica (apple), Oryza sativa (rice), Populus

trichocarpa (poplar), Theobroma cacao (cocoa), Vitis vinifera (grape), Ziziphus jujuba

(jujube), Solanum lycopersicum (tomato), or Prunus persica (peach) (Table S9). We

conducted analysis of Gene Ontology (GO) annotations for these orphan genes and found

they were enriched in GO terms including catalytic activity, metabolic process, cellular

process, and cell and cell part (Figure S6). KEGG pathway enrichment analysis of these

orphan genes indicated that they were enriched mainly in pathways for biosynthesis of

secondary metabolites and phenylpropanoids (Table S10). Polyphenolic compounds

including flavonoids, anthocyanins, and tannins are the main secondary metabolites in

pomegranate, and have potential antiviral, antifungal, or antibacterial properties

(Johanningsmeier and Harris, 2011). The orphan genes identified here might be responsible

for the biosynthesis of these secondary metabolites, and could ultimately have contributed to

the adaptation of pomegranate.

A comparison of pomegranate, Eucalyptus, grape, tomato, and Arabidopsis sequences

indicated 8854 gene families in common among these species and 1109 gene families unique

to pomegranate, while 1028 were unique to grape, 953 were unique to Arabidopsis and 1435

were unique to tomato (Figure S7). To study gene families that had expanded or contracted

only in pomegranate, we compared gene families from pomegranate to those of 11 other

representative species and to ancestors species. In total, 546 gene families had expanded, and

1926 gene families had contracted in the pomegranate genome compared to its most recent

common ancestor (Figure 2). Results of KEGG pathway enrichment analysis revealed that

these expanded gene families were concentrated in phenylpropanoid biosynthesis (Table

Page 9: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

S11), an observation consistent with the enrichment of genes unique to pomegranate. This

enrichment information provided vital insights into secondary metabolism in pomegranate.

To address the phylogenetic position of pomegranate, we performed whole-genome

analysis of 12 sequenced plant genomes, and generated 509 orthologous gene clusters.

Pomegranate, just as Eucalyptus, is in the order Myrtales, which is classified as a sister taxon

to the rosids (Figure 2), but not to the Malvids clade within the eurosids as in the Angiosperm

Phylogeny Group (APG, IV) classification (Angiosperm Phylogeny Group, 2016). The

phylogenetic position of pomegranate based on this genome-wide analysis provided more

evidence as to the grouping of Myrtales, and is in agreement with other recent whole-genome

studies (Argout, et al., 2011). The discrepancy between these classifications highlights the

trade-offs between the two methodologies (Martin et al., 2005), because the genetic evidence

used in the APG classification is based on single genes or on chloroplast, plastid,

mitochondrial, and nuclear ribosomal genes (Soltis et al.,2011).

Based on previous studies of pairwise synonymous substitution rates (Ks values)

between paralogs, we calculated the fourfold synonymous third-codon transversion (4DTv)

rates (Figure S8) to determine the age of the pomegranate WGD based on diversification of

these nucleotide sequences. A lineage-specific WGD and additional small-scale duplication

events were detected in pomegranate that had occurred since its divergence from grape.

In addition to the temporal evidence such as the distribution of 4DTv rates, spatial

evidence such as cross-species synteny further supports the lineage-specific WGD. Synteny

analyses between the genome of pomegranate and those of grape and peach showed clear

evidence of a single WGD event in the pomegranate lineage (Figure 3). For each genomic

region in grape and peach, we typically found two matching regions in pomegranate with a

similar level of divergence and gene-level fractionation (Figure S9a and S10a). The genomes

Page 10: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

of grape and peach all lacked recent WGDs after the genome triplication event (the γ event)

that is shared among all core eudicots (Argout et al., 2011; Jaillon et al., 2007; Jiao et al., 2012;

Verde et al., 2013). The overall 2:1 synteny patterns between pomegranate and grape or peach

suggest that pomegranate had experienced another WGD after its divergence from these

related rosid genomes, which lacked recent WGDs. The pomegranate WGD was further

confirmed by comparing the pomegranate genome to itself (Figure S9b) and readily

identifying large duplicated blocks of relatively recent origin.

The dating of the pomegranate WGD can be further narrowed down by comparison to the

related Eucalyptus genome. Both Eucalyptus and pomegranate belong to the order Myrtales

under the APG IV classification (Angiosperm Phylogeny Group 2016; Magallon et al., 2015),

a conclusion that is also well supported by our whole-genome phylogeny. A previous study

inferred that Eucalyptus had undergone a lineage-specific WGD event, estimated to have

occurred 109.9 (105.9–113.9) million years ago (mya) (Myburg et al., 2014). With the

pomegranate genome sequence, we found that the most recent WGDs in Eucalyptus

and pomegranate occurred in their common ancestor. The Eucalyptus genome had undergone a

WGD after the known paleo-hexaploidization event γ (Myburg et al., 2014), and also showed a

clear 2:1 synteny pattern when compared to the genomes of grape or peach (Figure 3). The

similar 2:1 synteny patterns suggested that Eucalyptus and pomegranate might once have had

the same ploidy level (Figure 3 and Figure S10b). Additionally, a clear 1:1 synteny pattern was

observed in a Eucalyptus-pomegranate comparison by reciprocal best BLAST matches (Figure

S9c and S10c), indicating that the WGD predated their divergence, i.e., they shared a WGD

event. If the WGDs in these two species were independent events that occurred after the

divergence of Eucalyptus and pomegranate, we would expect instead to see a 2:2 synteny

Page 11: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

pattern. Thus, the observed 1:1 synteny pattern excludes the possibility of two separate WGD

events.

The WGD in the Myrtales, first discovered in the Eucalyptus genome, has now also been

observed in the pomegranate genome. The typical macro- and micro-synteny pattern reflected

a 1:2 relationship between genomic regions from grape or peach compared to the two

Myrtales genomes (Figure 3). The pomegranate genome sequences have provided more

precise timing for the Myrtales WGD event, which has now been placed closer to the origin

of this clade.

Disease resistance genes and transcription factors in the pomegranate genome

Plants have evolved a wide range of resistance mechanisms in their constant struggle for

survival. Beyond simple physical or chemical barriers, more sophisticated biochemical

mechanisms are based on gene-for-gene interactions between plants and infectious agents. By

querying specific protein domains and applying classification schemes (Sanseverino et al.,

2010; Lehti-Shiu and Shiu, 2012), we identified 710 (2.43 % of the gene models) R genes and

995 (3.40% of the gene models) genes encoding protein kinases in the pomegranate genome

(Table S12 and S13), numbers similar to those in the grape and Arabidopsis genomes (Jaillon

et al., 2007; Arabidopsis Genome Initiative 2000). We mapped the putative R genes to nine

pseudo-chromosomes and found that they were nonrandomly distributed (Figure S11). The

observed enrichment of R genes in these certain genomic regions indicated that R genes

might have evolved by tandem duplication and subsequent divergence of linked gene

families.

Page 12: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Transcription factors (TFs) and transcriptional regulators (TRs) also play key roles in

adaption to environmental changes by regulating the transcription of their target genes in a

spatiotemporal manner. Based on the comprehensive rules for TF identification and

classification in the Plant Transcription Factor Database (PlantTFDB 3.0, Jin et al., 2014), we

identified a total of 1867 (6.39% of the gene set) TFs in the pomegranate genome. For

comparison, the total numbers of TFs in grape, Arabidopsis and Eucalyptus were 1690

(Jaillon et al., 2007), 2195 (Arabidopsis Genome Initiative 2000) and 2207 (Myburg et al.,

2014), respectively (Table S14).

To analyze the evolution of transcription factors, we explored the evolutionary patterns of

the MADS-box superfamily in pomegranate. MADS-box TFs are central controllers of floral

organ identity (Smaczniak et al., 2012). The development and growth of floral organs is of

great importance from both biological and agricultural perspectives, for the reproduction and

spread of species and the improvement of yield in seed crops, respectively. Over recent

decades, extensive genetic and molecular research have supported the elaboration of a

framework for the development of flower organ identity, known as the ABC(E) model

(Cucinotta et al., 2014). Variation in flower structure is key to the evolutionary success of

angiosperms (Yan et al., 2016). Genome-wide genetic information could provide a global

view of the gene regulatory networks necessary for specifying flower organ development in

pomegranate. Specific MADS-box gene lineages in pomegranate have been duplicated and

become diversified. In particular, we see dramatic expansions in the copy number of type I

homologs of the Mα and Mγ subfamilies, although this type of expansion is relatively

common for the type I loci (Figure S12a). Members of these lineages control aspects of

gametophyte and endosperm development that appear to evolve quite rapidly (Masiero et al.,

2011). More surprising are the large expansions observed in the AGL22/SHORT

Page 13: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

VEGETATIVE PHASE (AGL22/SVP) subfamily, although independent increases in copy

number in this subfamily were also observed in both Vitis and the close pomegranate relative

Eucalyptus (Figure S12b). AGL22/SVP homologs contribute to the regulation of flowering

time in Arabidopsis, but participate in other aspects of floral development, including

inflorescence structure and fruit abscission (Thouet et al., 2012) in other dicots. Further study

of these recent paralogs will be necessary to understand how they might contribute to

pomegranate development.

Development of seed coats in pomegranate

The ovules of pomegranate are bitegmic; that is, they possess two integuments that

develop into the distinct inner and outer seed coats of pomegranate seeds that protect the

endosperm and embryo and contribute to seed dormancy, dispersal, and germination

(Debeaujon et al., 2000). The soft, edible outer coat of pomegranate is highly vacuolized

and is rich in sugars, organic acids, anthocyanins, and minerals (Fawole and Opara, 2013a).

The compact cells of the inner seed coat show little to no vacuolization and are enriched in

lignin and fiber. The molecular mechanisms of integument and seed coat development were

previously unknown because of limited molecular genetic information available for

pomegranate. The genomic and transcription data described here will provide a basis for

further insights into the molecular mechanisms of integument and seed coat development in

pomegranate.

Lignin, cellulose, and hemicellulose comprise a substantial portion of seed biomass and

determine seed hardness (Hu et al., 1999; Perez et al., 2002). We extracted genes involved in

lignin, cellulose and hemicellulose metabolism, and conducted transcriptomic analysis of the

inner and outer seed coats of the hard-seeded pomegranate cultivar ‘Dabenzi’ at 50, 95, and

Page 14: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

145 days after pollination (DAP), and of the inner seed coat of the semi-soft-seeded cultivar

‘Baiyushizi’ and the soft-seeded cultivar ‘Tunisi’ at 50 DAP (Table S15). The sets of genes

encoding the key enzymes involved in lignin, cellulose, and hemicellulose synthesis and

degradation were nearly the same in the outer and inner seed coats. Expression of genes

involved in lignin, cellulose, and hemicellulose synthesis increased regardless of stage of

development or cultivar, as did those involved in degradation of these molecules (Figure 4).

Thus, the structural differences in seed coats among cultivars might be related to the relative

abundance of transcripts encoding enzymes catalyzing lignin, cellulose, and hemicellulose

synthesis and degradation.

Seed size, which can depend on seed coat development, is important for both fitness and

yield (Xia et al., 2013). The functions of genes involved in the development of the inner or

outer integument have been shown previously in other plant species (Prasad et al., 2010;

Villanueva et al., 1999). In pomegranate, genes orthologous to FLORAL BINDING

PROTEIN 24 (FBP24) and ABERRANT TESTA SHAPE (ATS) are expressed during inner

integument development, whereas genes orthologous to INNER NO OUTER (INO), SEUSS

(SEU), and KANADI1/2 (KAN1/2) are expressed during outer integument development (Table

S16). Plant organs develop through processes involving organized cell growth, proliferation,

and expansion (Prasad and Ambrose 2010). The expression of genes involved in integument

development (Endress 2011a; Prasad and Ambrose 2010; Skinner et al., 2004) has been

analyzed for inner and outer seed coats and flowers to capture early developmental stages of

the inner and outer seed coats. TRANSPARENT TESTA GLABRA 1 (TTG1) and

TRANSPARENT TESTA GLABRA 2 (TTG2) (Gonzalez et al., 2009), which are involved in

integument development through cell expansion, and BIG BROTHER (BB) and DA 1 (which

means ‘large’ in Chinese) (Li et al., 2008; Xia et al., 2013), which affect integument

development through repression of cell expansion, are all expressed dynamically in the inner

Page 15: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

and outer seed coats of pomegranate. The dynamic expression patterns of these genes during

integument development indicated that DSO, BELL1 (BEL1), TTG1, BB, YAB1, and YAB2

might contribute to inner seed coat development, and that INNER NO OUTER (INO), DSO,

KAN1, KAN2, and TTG2 might contribute to outer seed coat development in pomegranate

(Figure 4).

The fleshy, edible outer part of the pomegranate seed coat is unique among fruits,

compared to the receptacle of pear and apple, the ovary of peach and apricot, and the aril of

lychee, which represent the edible portions of those fruits. Genes in the YABBY family exhibit

polar expression in the apical meristem and flower primordium (Siegfried et al., 1999;

Villanueva et al., 1999) and the functions of INO, YABBY1 (FIL/AFO), YABBY2 (YAB2), and

YABBY3 (YAB3) in integument growth have been shown in Arabidopsis (Meister et al.,

2005). Given the highly conserved YABBY genes in eudicots (Toriba et al., 2007), no obvious

expansion of the YABBY gene family was observed in the pomegranate genome (Figure 5a).

The amino acid sequences of putative YABBY orthologs from 16 species belonging to 10

families in 14 genera (including pomegranate, apple, Eucalyptus, grape, and Arabidopsis)

were aligned (Figure S13 and S14) for analysis of selection using PAML (V4.8) software. An

amino acid site under positive selection was detected at the 35th amino acid residue (Q) in the

predicted protein encoded by the pomegranate INO gene (Figure 5b). INO is essential for the

development and asymmetric growth of the outer integument and is critical for outer

integument development (Mcabee et al., 2005). Polar expression of INO during outer

integument development is regulated by the expression of ANT and BEL1 (Meister et al.,

2004; Sieber et al., 2004; Skinner et al., 2004). Positive selection has also been detected in

the predicted protein encoded by the pomegranate BEL1 gene (Figure S15). Positive selection

acting on the INO, and its regulator BEL1 might account for the polar expression of INO

Page 16: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

during integument development, which ultimately affects the expansion of the fleshy outer

seed coat of pomegranate.

The TPS gene family

The biosynthesis of terpenes, catalyzed by terpene synthase (TPS), is an important

metabolic pathway in plants. Terpenes are important volatiles with a variety of roles in

mediating both antagonistic and beneficial interactions among organisms (Gershenzon and

Dudareva 2007). Monoterpenes such as α-pinene, β-myrcene, and limonene are major

components of pomegranate flavor (Kirshinbaum, 2012; Melgarejo et al., 2011) that appeal

to consumers. Monoterpenes such as 1,8-cineole from Eucalyptus can act as powerful natural

antimicrobials and pesticides (Batish et al., 2008). To characterize the TPSs that participate in

terpene biosynthesis, we predicted and compared TPS genes from the pomegranate and

Eucalyptus genomes, and from their ancestral lineage, grape, with those from the Arabidopsis

genome. The TPS genes of pomegranate were evolutionarily conserved in every subfamily

branch of the TPS gene family. The phylogenetic tree indicated that the TPS-b and TPS-g

subfamilies in pomegranate were the result of post-speciation gene duplications and

functional diversification during evolution. Although gene duplication may be the

predominant pattern for evolution of genes in the TPS-a subfamily (Figure S16) that produces

monoterpenes in grape (Martin et al., 2010), the different monoterpenes produced in

pomegranate and Eucalyptus may not attribute to the differential evolutionary of TPSs.

Page 17: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Anthocyanin metabolism

Anthocyanins are the major pigments responsible for the color of pomegranate fruits and

flowers. In addition to their dietary antioxidant properties, anthocyanin pigments function to

attract pollinators to flowers and seed dispersers to fruits (Winkel-Shirley, 2001).

Anthocyanin metabolism in the plant phenylpropanoid pathway has been studied extensively,

and many genes involved in the pathway have been isolated (Zhao et al., 2015). Studies of

anthocyanin biosynthesis have revealed that the expression of late genes in the anthocyanin

biosynthetic pathway depend on the activity of WD-repeat/MYB/bHLH transcriptional

complexes (Gonzalez et al., 2008; Pelletier et al., 1999). In the present study, we predicted

gene models for the late anthocyanin structural genes and their transcription factors and

studied their expression patterns in flowers and outer seed coats at different developmental

stages (50, 95, and 140 DAP) (Table S17and S18). The expression of several genes in the

anthocyanin metabolism pathway encoding dihydroflavonol 4-reductase (DFR),

leucoanthocyanidin dioxygenase (LDOX/ANS), anthocyanidin 3-O-glucosyltransferase

(BZ1), and anthocyanidin 3-O-glucoside-5-O- glucosyltransferase (UGT75C), in addition to

genes encoding MYB, basic Helix-Loop-Helix (bHLH) and WD40-repeat, increased in

concert with the accumulation of anthocyanins in the outer seed coats during development

(Figure 6). These genes are therefore considered candidate genes for anthocyanin

biosynthesis in pomegranate. The expression patterns of candidate genes and TFs involved in

anthocyanin metabolism in pomegranate flowers and outer seed coats indicated that different

candidate genes and TFs contribute to the anthocyanin accumulation in flower and outer seed

coats. This information will be useful for further studies of anthocyanin metabolism in

pomegranate.

Page 18: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Punicalagin metabolism

Pomegranate is an important source of bioactive compounds and has been used in

medicine for many centuries (Longtin, 2003). Its potential pharmaceutical properties are

attributable to high contents of polyphenols, such as ellagitannins. Gallic acid is a common

precursor for EA and punicalagin synthesis, although its metabolism in plants is still an open

question. We have drafted a gallic acid biosynthesis pathway according to KEGG and the in

vivo reactions that have been reported (Mishra and Gold 2006; Ossipov et al., 2003), and

have surveyed genes encoding the corresponding enzymes and their expressions in different

tissues of pomegranate (Table S19). Genes encoding the bifunctional enzyme dehydroquinate

dehydratase/shikimate dehydrogenase (EC 4.2.1.10/1.1.1.25) (Singh and Christendat, 2006),

3-dehydroshikimate dehydratase (EC 4.2.1.118), dehydroxylase (1.17.98.x), and

methyltransferase (EC 2.1.1.-) were found in the pomegranate genome, so gallic acid in

pomegranate might be derived from shikimic acid and syringate, but not quinic acid.

Pgr015317.4 was the only gene encoding 3-dehydroshikimate dehydratase identified in

pomegranate genome.Its expression implied that gallic acid in root and flower might be

derived from shikimic acid, while that in leaf might be derived from syringate. The

expression of genes encoding these enzymes in the pericarp and in the inner and outer seed

coats at different developmental stages indicated that the two biosynthetic pathways might

both occur in the outer seed coat, while the gallic acid in inner seed coat and perpicarp might

be derived from syringate at early stages of development and shikimic acid at later stages of

development (Figure 7).

Derivatives of gallic acid, EA and punicalagin, have been identified in different

pomegranate tissues, but their metabolism has not yet been fully characterized.

Carboxylesterase (EC 3.1.1.1), glycosyltransferase (EC 2.4.1.x), acyltransferase (EC 2.3.1.x),

and glucohydrolase (EC 3.2.1.21) are candidate enzymes in the punicalagin biosynthesis

Page 19: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

pathway (Figure 7). Because UDP-glycosyltransferases (UGT) that form glucose esters with

benzoates belong to the phylogenetic group L (Lim et al., 2002), genes encoding

galloyl-O-β-d-glucosyltransferase (UGGT) in pomegranate probably also belong to group L.

We detected 13 genes encoding UGGT in the pomegranate genome (Figure S17). Similarly,

genes encoding an acyltransferase that uses hydroxycinnamoyl/ benzoyl CoA as an acyl

donor (D'auria, 2006) might participate in the transfer of O-galloyl groups during punicalagin

synthesis. We identified eight genes encoding this acyltransferase in the pomegranate genome

(Figure S18). One gene encoding carboxylesterase, an enzyme that interconverts gallic acid

and ellagic acid, was identified in the pomegranate genome. Its low expression in the inner

seed coat implied that little interconversion occurred in this tissue, which might account for

its low ellagic acid contents (Fischer et al., 2011). The pathway for punicalagin metabolism

and candidate genes predicted here will provide invaluable information for the genetic

improvement of production of bioacitive compounds in pomegranate.

Punicalagins are hydrolyzed to EA in the gut, and then EA is then gradually metabolized

by the intestinal microbiota to produce different types of urolithins (Cerda et al., 2004). This

process could include lactone-ring cleavage, decarboxylation and dehydroxylation reactions

(Selma et al., 2009), and then tannin acetylhydrolase (EC 3.1.1.20), dehydroxylase (EC

1.17.98.x) and gallate decarboxylase (EC 4.1.1.59) are candidate enzymes that might

participate in the metabolism of EA to urolithins (Figure 7). We surveyed genes encoding

these enyzmes in intestinal microflora, such as Lactobacillus plantarum, Lactobacillus

paraplantarum, Clostridium scindens, and Debaryomyces hansenii and obtained orthologous

genes encoding acetylhydrolase and dehydroxylase, but no reference sequences encoding

gallate decarboxylase could be found in the data for these microflora. Refer to the domains of

reference sequences, we identified 33genes encoding dehydroxylase in pomegranate genome,

but no gene encoding acetylhydrolase. Alignment of pomegranate gene sequences for

Page 20: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

dehydroxylase to the sequences from microflora (Figure S19) showed that a low degree of

conservation of genes encoding dehydroxylase among kingdoms might account for whether

these enzymes function in punicalagin degradation in different species. Lacking gene

encoding acetylhydrolase may also be responsible for the lacking punicalagin degradation in

pomegranate.

DISCUSSION

Pomegranate, an ellagitannin-rich fruit crop, is a widely used source of dietary

polyphenols. An analysis of the antioxidant activity of various dietary supplements has

shown that the antioxidant activity of pomegranate juice exceeded that of orange juice

(Parashar, 2011), milk thistle, green tea, grape seed, goji, acai (Henning et al., 2014), and

infusions of red wine or green tea (Gil et al., 2000). In addition to antioxidant function, EA

and punicalagins in pomegranate have been implicated as potent anticancer and

anti-atherosclerotic agents (Adams et al., 2010; Usta et al., 2013; Viuda-Martos et al., 2010).

Although the potential health benefits of punicalagins have been recognized, the biosynthesis

of punicalagins has not yet been thoroughly characterized. Based on the pathways for

biosynthesis of gallic acid and the structures of its derivatives, we predicted the punicalagin

biosynthetic pathway and identified candidate genes encoding enzymes in this pathway. The

pomegranate genome sequence offers an unprecedented opportunity to validate these

candidate genes and fully characterize punicalagin metabolism, which will provide

knowledge and tools for pomegranate breeding to increase punicalagin production.

Punicalagins are metabolized into bioavailable hydroxy-6H-dibenzopyran-6-one

derivatives (urolithins) by the colonic microflora in healthy humans. However, its precursors,

punicalagin, and ellagic acid, could not be detected in either plasma or urine (Espin et al.,

Page 21: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

2013), although that could be due to rapid excretion (Seeram, et al., 2004). Health benefits of

urolithins in humans and animals have previously been reported (Espin, et al., 2013;

Kasimsetty et al., 2010; Ryu et al., 2016), and the putative pathway for urolithin metabolism

has provided a point of reference to understand their function and utility.

Besides their nutrient and potential health-promoting effects, secondary metabolites

including ellagitannins, fragrant monoterperenes, and colored anthocyanins in flowers and

fruits also play a vital role as defense (against herbivores, microbes, and competing plants)

and signal compounds (to attract pollinating or seed-dispersing animals) (Wink 2003). The

occurence of these secondary metabolites could reflect adaptation and particular life

strategies within a particular plant phylogenetic framework. The genome of pomegranate now

provides powerful tools for identification of genes on a large-scale and elucidation of

pathways for these secondary metabolites.

Analysis of adaptive evolution and survival mechanisms in plants are long-standing

research objectives. Enhancing stress resistance and modifying morphological structures via

gene duplication and neofunctionalization are strategies that have allowed plants to resist or

tolerate biotic and abiotic stresses during evolution (Benderoth et al., 2006). As the key

feature of angiosperm seeds, the integument can be traced back to the evolution of the earliest

angiosperm plants (Endress, 2011a). Pomegranate has bitegmic ovules (Endress, 2011b), and

its outer integument has evolved into a fleshy, juicy, and expanded outer seed coat, the edible

part of the fruit. From a molecular developmental genetics perspective, the outer integument

is thought to have arisen due to the assymmetric expression of INO, a marker gene for outer

integument development (Eshed, 2001). Positive selection acts on the frequency of mutations

and can result in species diversification that provides an evolutionary strategy to cope with

Page 22: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

challenges imposed by biotic and abiotic stresses. Positive selection operating on the INO

gene in pomegranate might have contributed to expansion of the outer seed coat that attracts

herbivores to disperse seeds and contributes to seed size and yield, with subsequent effects on

species fitness and survival.

This high-quality genome of pomegranate will facilitate future investigation into its

adaptation and biosynthesis of secondary metabolites, and will provide valuable information

for elucidating many other desirable traits in this nutritious fruit crop with medicinal

properties.

EXPERIMENTAL PROCEDURES

Plant materials, DNA and RNA preparation

DNA extracted using CTAB method (Murray and Thompson 1980) from fresh leaf

tissues of pomegranate cultivar ‘Dabenzi’ was used for whole-genome shotgun sequencing.

RNAs extracted using the Plant RNA Isolation Kit (AutoLab, Beijing, China) were used for

transcriptome analysis of the following tissues: seeds of the semi-soft seeded cultivar

‘Baiyushizi’ (BY) and the soft-seeded cultivar ‘Tunisi’ (T) at 50 day after pollination (DAP);

and leaf (L), root (R), flower (F), and pericarp (P), inner seed coat (ISC), and outer seed coat

(OSC) at 50, 95, and 140 DAP of the hard-seeded cultivar ‘Dabenzi’ (DB). Flower and inner

seed coat at 50, 95, and 140 DAP of ‘Dabenzi’ were used for total anthocyanin content

determination. Outer seed coat at 50, 65, 80, 95, 110, 125, and 140 DAP of ‘Dabenzi’ were

used for seed hardness determination. Pericarp, inner seed coat and outer seed coat at 50, 95,

and 140 DAP of ‘Dabenzi’ are abbreviated as DB-P50, DB-P95, DB-P140, DB-ISC50,

DB-ISC95, DB-ISC140, DB-OSC50, DB-OSC95, and DB-OSC140, respectively (Table

S20).

Page 23: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Genome sequencing and de novo assembly

We carried out whole-genome shotgun sequencing using the Illumina HiSeq 2000

platform. Two paired-end libraries with 170-bp or 500-bp inserts, and five mate-paired

libraries with 2-kb, 5-kb, 10-kb, 20-kb, or 40-kb inserts were constructed and sequenced.

Total sequencing depth was about 305.52×, considering the predicted genome size of 357 Mb

according to the k-mer method.

The pomegranate genome was assembled using SOAPdenovo2 (Luo et al., 2012)

(Version_2.04) with clean reads from seven libraries. Gap Closer

(http://sourceforge.net/projects/soapdenovo2/files/GapCloser/) was used to fill gaps based on

paired-end reads. This generated assembly V1.0 (Table S21).

Estimating the pomegranate genome size by flow cytometry (FCM)

The genome size of pomegranate cultivar ‘Dabenzi’ was estimated by FCM, using the

genome of Arabidopsis thaliana Col-0 as reference. Fluorescence intensity determination is

shown in Figure S3. The genome size calculation for pomegranate was performed as:

250.46/95.41 × 125 Mb = 328.13 Mb, which was close to the k-mer predicted size.

Genetic map construction and chromosome assembly

A genetic map was constructed using 195 F1 lines developed from a cross between

‘Dabenzi’ and ‘Baiyushizi’ by a reduced representation restriction-associated DNA (RAD)

sequencing method using the RAD-Seq protocol (Baird et al., 2008).

Page 24: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

SNP-calling for each individual RAD read was carried out using a pipeline. Each

individual’s paired-end RAD reads were mapped onto the assembled reference genome using

the alignment software SOAP2 (Li et al., 2009). SNP calling was performed using SOAPsnp

(Li et al., 2009). Markers (SNPs) showing significantly distorted segregation (p-value < 0.01)

according to Chi-square tests were excluded from use in map construction. Markers with

missing rates of less than 50% were used for linkage analysis using JoinMap 4.0 software

(https://www.kyazma.nl/index.php/JoinMap/General/) with CP (outbreeder full-sib family)

population type codes and the double pseudo-test cross strategy was applied (Grattapaglia

and Sederoff, 1994). Linkage groups were designated at a logarithm of the odds (LOD)

threshold of 6.0 and ordered using the regression-mapping algorithm.

Genome quality evaluation

Evaluation of assembly redundancy. We aligned the entire assembled genome against

itself using the MUMmer3.23/nucmer (Kurtz et al., 2004) program with parameters

-maxmatch -c 90 -l 20 to assess genome redundancy. Sequences were considered redundant if

they met all of the following criteria: (i) Match length ≥200 bp; (ii) Match identity ≥90%;

and (iii) one contig or scaffold totally contained within another contig or scaffold.

Evaluation of gene region coverage. Assembled unigenes derived from RNA-Seq data

for the 12 tissues described above in Plant materials were used to verify the completeness of

the genome assembly. A total of 105 472 unigenes were mapped to the assembled genome

using BLAT software (Kent, 2002) under default parameters. Analysis was performed at

three different percent sequence homology levels and percent coverage using custom Perl

scripts.

Page 25: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Genome annotation

Annotation of repeats. De novo and homology-based approaches were used to identify

the content of repeats in the pomegranate genome. Firstly, a pomegranate de novo consensus

repeat database was built using the software packages LTR_FINDER (Xu and Wang 2007)

(Version 1.0.6), PILER (Edgar and Myers 2005) (V1.0), and RepeatScout (Price et al., 2005)

(Version 1.0.5). Transposable elements were then identified using RepeatMasker and

RepeatProteinMask Version 3.3.0 (http://www.repeatmasker.org/) by using the Repbase

database (Jurka et al., 2005) to query the pomegranate genome (Table S22 and S23). Tandem

repeats in the genome were annotated using TRF (Benson 1999) (Version 4.04). The REPET

TEdenovo package (V2.2) (Flutre et al., 2011; Quesneville et al., 2005) was also used for de

novo identification and classification of transposable elements.

To infer insertion times of LTR retrotransposons, 496 full-length LTR retrotransposons

were identified using LTR_STRUC (Mccarthy and Mcdonald, 2003) with default parameters

and classifications (Figure S5).

Gene prediction. We used both homology-based and de novo methods for gene

prediction. For homology-based prediction, translated proteins from Arabidopsis thaliana

(Arabidopsis Genome Initiative, 2000), apple (Malus domestica) (Velasco et al., 2010),

tomato (Solanum lycopersicum) (The Tomato Genome Consortium, 2012), Eucalyptus

grandis (Myburg et al., 2014) and grape (Vitis vinifera) (Jaillon et al., 2007) were mapped

onto the assembled pomegranate genome using tblastn (Mount, 2007), then the program solar

was used to concatenate the several high-scoring segments (HSP) to the completed region for

each gene, and then we included 1-kb upstream and downstream regions to locate start and

stop codons and ensure that each gene region was complete. Finally, the GeneWise 2.2.0

(Birney and Durbin 2000) was used to predict gene structures and define gene models based

Page 26: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

on these complete regions. For de novo gene prediction, transposons and repeats were first

masked using RepeatMasker and RepeatProteinMask. Then AUGUSTUS (Stanke et al.,

2006) and SNAP (Majoros et al., 2004) were performed using a Hidden Markov Model

(HMM) trained by complete pomegranate genes (homolog predicted genes). These

complementary gene sets from homolog-based and de novo predictions were merged to

produce a non-redundant reference gene set using GLEAN

(http://sourceforge.net/projects/glean-gene/). In addition, RNA-Seq data from the 12 tissues

described above in Plant materials were also incorporated to aid gene annotation. The

RNA-Seq data were mapped to the assembled genome using TopHat (Trapnell et al., 2009),

and transcriptome-based gene structures (Figure S20) were obtained using Cufflinks

(http://cufflinks.cbcb.umd.edu/). We then compared this gene set with the previous GLEAN

gene set using Cuffcompare program to obtain the final non-redundant gene set (Table S7).

Functional annotation. To annotate predicted pomegranate genes, we aligned predicted

proteins translated from each gene to the SwissProt and TrEMBL databases (UniProt release

2015-04) using blastp (E-value ≤1e-5), and the function of the best hit was assigned to each

gene. We also annotated motifs and domains using InterProScan (InterProScan-5.11-51.0)

(Jones et al., 2014) by searching against publicly available databases, including Pfam

(Mistry and Finn 2007), PRINTS (Attwood et al., 1994), PANTHER

(http://www.pantherdb.org/), HAMAP (Pedruzzi et al., 2015), Gene3D (Lam et al., 2016),

PROSITE (Hulo et al., 2006), SUPERFAMILY (Gough et al., 2001), ProDom (Bru et al.,

2005), and SMART (Schultz et al., 1998). Gene Ontology (Ashburner et al., 2000) functional

annotations were retrieved from InterProScan. We also mapped all of the pomegranate genes

to KEGG pathways by searching the KEGG databases (Release 76) (Kanehisa and Goto,

2000) and identifying the best hit for each gene.

Page 27: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Genome evolution analysis

Gene sets from P. granatum, A. thaliana (Arabidopsis Genome Initiative, 2000)

(TAIR10), E. grandis (Myburg, et al., 2014) (JGI_9.0), C. papaya (Ming et al., 2008)

(JGI_8.0), V. vinifera (Jaillon et al., 2007) (Genoscope 12X), S. lycopersicum (The Tomato

Genome Consortium, 2012) (ITAG2.3_release), O. sativa (Goff et al., 2002) (IRGSP1.0), M.

domestica (Velasco et al., 2010) (JGI_9.0), P. trichocarpa (Tuskan et al., 2006) (JGI_9.0),

P. persica (Verde et al., 2013) (JGI_7.0), T. cacao (Argout, et al., 2011) (CIRAD_v0.9), and

Z. jujube (Liu et al., 2014) were used for analyses of genome evolution, including gene

clustering, phylogeny construction, divergence time estimation, and identification of

chromosome collinearity.

Gene clusters were identified using OrthoMCL (Li et al., 2003) with inflation value set

to 1.5 and using other default parameters. Gene families in these 12 species (Table S9) and

shared gene families in a subset of five species (P. granatum, E. grandis, A. thaliana, V.

vinifera, and S. lycopersicum) were also generated (Figure S7). Based on the OrthoMCL gene

clusters, the proteins of 509 single-copy gene families were aligned using MUSCLE (Edgar

2004) (http://www.ebi.ac.uk/Tools/msa/muscle/), then fourfold degenerate sites (4dTv) were

extracted and orthologous single-copy genes in each species were used for phylogeny

construction and divergence time estimation. The software MrBayes (Huelsenbeck and

Ronquist, 2001) was chosen to construct the phylogenetic tree of orthologs using the GTR

model. To validate this phylogenetic tree, we also constructed other phylogenetic trees using

PhyML (Guindon et al., 2010) software based on single-copy genes (Figure S21).

A Markov Chain Monte Carlo algorithm for Bayesian inference was implemented to

estimate the divergence times of species using the MCMCTree program in the PAML

package (Yang 2007) with a fixed corrected time point of ~71.9 (~70–74) mya as the time of

Page 28: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

the split between Arabidopsis and papaya used for the calibration time (Wikstrom et al.,

2001). The phylogenetic relationships among these species and the estimates of divergence

time between species are shown in Figure S22. The time of the split between pomegranate

and Eucalyptus is estimated at 91.6 (66.4–112.6) mya within the Myrtales branch of the

rosids.

MCScan (http://chibba.agtec.uga.edu/duplication/mcscan) was used to construct blocks

of chromosome collinearity between pomegranate, Eucalyptus, and grape. Syntenic blocks

containing at least five genes were identified based on similarities between gene pairs. All of

the orthologous or paralogous gene pairs were extracted from the syntenic blocks, then used

to calculate the 4dTv (Huang et al., 2009) distances using the HKY substitution model

(Hasegawa et al., 1985).

We extracted similar gene pairs using the OrthoMCL gene families of pomegranate

(blastp: E <1e-15, identity >40%, match length >100 amino acids) after removing tandem

duplicated gene pairs, then calculated the average synonymous substitutions per synonymous

site (Ks) of the 5570 duplicated gene pairs in the pomegranate genome (Figure S23).

We performed pairwise genome alignments (Phytozome version 11) (Goodstein et al.,

2012) between the grape and Eucalyptus genomes and the pomegranate genome. For each

pairwise alignment, the coding sequences of predicted gene models were compared using

LAST (Kielbasa et al., 2011). Our synteny search pipeline defines syntenic blocks by

chaining the LAST hits with a distance cutoff of 20 genes, and requires at least four gene

pairs per synteny block. For homologs inferred from syntenic alignments, we aligned the

protein sequences using CLUSTALW (Larkin et al., 2007) and used the protein alignments to

Page 29: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

guide coding sequence alignments in PAL2NAL (Suyama et al., 2006). Finally, we used the

Nei-Gojobori (Nei and Gojobori, 1986) method implemented in the yn00 program of the

PAML package (Yang 2007) to calculate Ks values.

Functional enrichment of genes duplicated in WGD

We extracted all genes in the syntenic blocks of the M-WGD, the γ-WGD, and recent

small-scale gene duplication events in the pomegranate genome, and then carried out

functional enrichment analysis using the KEGG pathways (Kanehisa et al., 2008) and Gene

Ontology (Conesa et al., 2005) (GO) databases with Q-value (corrected p-value) <0.01

(Table S24-25 and Figure S24-25).

Expansion and contraction of gene families

CAFÉ (De Bie et al., 2006) (V2.1) software was used to analyze the expansion and

contraction of gene families (Figure 1). In CAFÉ, a random birth and death model is

proposed to study gene gain and loss in gene families across a specified phylogenetic tree.

The divergence time is represented by the branch length values. The global parameter λ

(lambda) is estimated using a maximum likelihood method, and a gene family with p <0.05

is considered to have gained or lost genes. The 88 genes identified as members of

significantly expanding gene families (p-value ≤0.01) in pomegranate were mapped to the

KEGG pathways and used for further analysis of functional enrichment(Figure S26).

Page 30: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Identification of transcription factors, transcriptional regulators, protein kinases and R

genes

Searches for consensus transcription factor (TF) and other transcriptional regulator (TR)

domains in pomegranate proteins were performed in the Plant Transcription Factor Database

PlnTFDB (Perez-Rodriguez et al., 2010) (http://plntfdb.bio.uni-potsdam.de/v3.0/) and the

PlantTFDB (Jin et al., 2014) (http://planttfdb.cbi.pku.edu.cn/) using HMMER3.0

(http://hmmer.org/). TFs and TRs were then identified and classified according to the

consensus rules including the required and prohibited protein domains for each TF or TR

gene family summarized at the PlnTFDB.

Plant protein kinases (PKs) were also identified based on domain information. Sequences

with a significant hit for protein kinase domains (PF00069 and PF07714) in the Pfam

database were classified into protein kinase gene families by comparing their sequences to a

set of HMMs of kinase domains (Lehti-Shiu and Shiu, 2012). Based on the above methods,

we predicted TFs and protein kinases in the genomes of pomegranate, Eucalyptus, grape, and

Arabidopsis.

Profile HMMs of Resistance (R) gene domains from the PRGDB database (Sanseverino

et al., 2010) (http://www.prgdb.org) were used to predict R genes. The predicted proteome of

pomegranate was used to query the PRGDB database using hmmsearch in HMMER 3.0 with

default thresholds. The putative R genes predicted in pomegranate, grape, Eucalyptus, and

Arabidopsis were then classified into 10 types according to the domains’ combination rules in

PRGDB (Sanseverino et al., 2010) (http://prgdb.crg.eu/). As for the CC motif in the

Page 31: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

N-terminal region, all proteins were searched using the program paircoil2 (Mcdonnell et

al,2006) with a P-score cut-off of 0.025.

Identification, classification, and phylogenetic analysis of MADS-box genes

We identified 88 MADS-box genes in the pomegranate genome using protein

motif/domain designations and surveyed their expression patterns in the 12 transcriptomes of

‘Dabenzi’ described above in Plant materials. Each of these 88 protein sequences were used

to query the SwissProt database (release-2015_04) using blastp, and the function of the best

hit was assigned to each protein. Sequences of MADS-box genes from A. thaliana were

downloaded from The Arabidopsis Information Resource database

(http://www.arabidopsis.org/browse/genefamily/MADSlike.jsp) (Parenicova et al., 2003),

sequences of MADS-box genes from rice (O. sativa), poplar (P. trichocarpa), grape (V.

vinifera), petunia (Petunia×hybrida), Eucalyptus (E. grandis), and columbine (Aquilegia

coerulea) were downloaded from PlantTFDB

(http://planttfdb.cbi.pku.edu.cn/family.php?fam=M-type). Multiple alignments of these

protein sequences were conducted using MUSCLE software. The phylogenetic tree was

constructed with a maximum likelihood method using PhyML software and a Whelan and

Goldman (WAG) model, and was then visualized and customized using EvolView

(http://evolgenius.info/). Duplication, anchoring, and expression of MADS-box genes is

shown in Figure S27 and Table S26.

Page 32: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Determination of pomegranate seed hardness

Seed hardness in fruits of ‘Dabenzi’ at 50, 65, 80, 95, 110, 125, and 140 DAP,

‘Baiyushizi’ at 50 DAP, and ‘Tunisi’ at 50 DAP were determined using a TA.AX PLUS

texture analyzer(Food Technology Corporation, Sterling, VA)equipped with a HDP/VB

detector. The parameters were as follows. Pre-test, post-test, and test speeds were each 1

mm/s. The touch force was 5 g and the puncture depth was 2 mm. The value at the maximum

peak was the force at seed break, noted as seed hardness (g) (Fawole and Opara, 2013b). The

average hardness of 30 seeds per fruit represented the seed hardness of the fruit (Table S27).

Statistical analysis of hardness was conducted using Tukey’s test in SPSS (Statistical Package

for the Social Sciences,) V16.0 for Windows for fruits at different development stages.

Identification and analysis of candidate genes potentially involved in cellulose,

hemicellulose, and lignin metabolism

Enzymes that participate in the synthesis and degradation of cellulose, hemicellulose, and

lignin were surveyed in KEGG. All of the genes from the pomegranate genome were mapped

to KEGG pathways (Version 76) using blast. Genes that could be aligned to the KEGG

Orthology entries of relevant pathways using blastx (E-value ≤1e-20 and identity ≥60%)

were considered candidate genes for cellulose, hemicellulose, or lignin metabolism.

Log2-transformed FPKM values were used to represent gene expression as a heat map

(Figure 4 and Table S15).

Page 33: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Identification and analysis of candidate genes potentially involved in integument

development

We aligned all pomegranate predicted genes to the SwissProt database

(release-2015_04) using blastx with identity ≥40% and reference coverage ≥50% to obtain

functional annotations for each gene. The function of the best hit was assigned to each

homologous pomegranate gene. Finally, we identified 43 candidate genes potentially

involved in integument development in pomegranate. Genes repressing integument

development ares marked with a minus sign to distinguish them from those promoting

integument development. Detailed data regarding gene expression are provided in Table S16

and Log2-transformed FPKM values were used to show genesexpression as a heat map

(Figure 4).

Phyletic evolution and analysis of positive selection in the YABBY gene family

Genes in the pomegranate genome that contain YABBY domains were identified using

the same rules those used for TF identification described above. Sequences of YABBY genes

in Arabidopsis including YAB1 (AFO), YAB2, YAB3, YAB5, CRC, and INO were downloaded

from the TAIR database (http://www.arabidopsis.org/index.jsp). Multiple sequence

alignments with protein sequences were conducted using MUSCLE. The phylogenetic tree

was constructed using PhyML software with a maximum likelihood method and a WAG

model. The phylogenetic tree was then visualized and customized using EvolView

(http://evolgenius.info/) (Figure 5). YAB1 (AFO), YAB2, YAB5, CRC, and INO were

identified in pomegranate genome.

Page 34: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

We downloaded the INO sequences for 15 species belonging to 13 genera from NCBI to

analyze selection pressures on these genes. Firstly, multiple sequence alignments of proteins

were conducted using GUIDANCE2 with the PRANK algorithm. Then, analysis of selection

was performed using the alignments and phylogenetic trees using the CodeML program in

the PAML (V4.8) package. The average sequencing depth of pomegranate INO was about

105×. Upon manual inspection, we determined that the 35th amino acid residue encoded by

the pomegranate INO gene had likely undergone positive selection during its evolutionary

history (Figure 5).

Sequences of four other YABBY genes, YAB1 (AFO), YAB2, YAB5, CRC, and the

regulators of INO, ANT and BEL1, were then predicted in the gene sets of the above 15

species using Genewise. The phylogenetic tree was constructed using PhyML software with a

maximum likelihood method and a WAG model, and was then visualized and

customized using EvolView (http://evolgenius.info/) (Figure S13). Multiple protein sequence

alignments were visualized and customized using ENDscript (Robert and Gouet 2014)

(Figure S14). Analysis of selection was performed using the PAML (V4.8) package and

positive selection was also detected in the predicted proteins encoded by the pomegranate

ANT and BEL1 genes (Figure S15).

Analysis of the terpene synthase gene family

Pomegranate TPS genes were identified on a genome-wide scale using a homology-based

method in (See Supplementary Note). For E. grandis, we downloaded TPS genes from its

genome (Myburg et al., 2014), then filtered and removed redundancy by picking one

representative protein coding gene of each locus. The sequences of grape and Eucalyptus TPS

genes were extracted from their respective published genomic data (Martin et al., 2010;

Page 35: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Myburg et al., 2014). Sequences of the TPS genes from Arabidopsis were obtained from the

TAIR database. To investigate the phylogenetic relationships among members of the TPS

gene family in pomegranate grape, Eucalyptus, and Arabidopsis, we constructed a

phylogenetic tree as follows.

Multiple alignments of all TPS protein sequences from these species were conducted

using MUSCLE software. The phylogenetic tree was then constructed using PhyML software

with a maximum likelihood method and a WAG model. The phylogenetic tree of TPS

sequences was then visualized and customized using EvolView (Figure S16).

Identification and analysis of genes putatively involved in anthocyanin metabolism and

gene families of their putative transcriptional regulators

We mapped the pomegranate genes to the Anthocyanin biosynthesis pathway

(map00942) in KEGG and aligned them to the KO entries using blastx with E-value ≤1e-20

and identity ≥50%. The bHLH and MYB genes were identified according to same rules as

those used for the prediction of TFs above. WD40-repeat proteins were predicted by

searching the Pfam database with WD40 domains (PF00400).

To study the relationship between anthocyanin content and gene expression, anthocyanin

content and gene expression patterns were analyzed in pomegranate flowers and outer seed

coats at different developmental stages. The total anthocyanin content was determined by

extracting tissues with methanol containing 1% HCl, and measuring absorbence at 530, 620,

and 650 nm by UV spectrophotometer (Puxi Tongyong Apparatus Ltd., Beijing, China).

Anthocyanin content was determined by the formula: OD = (A530 - A620) - 0.1 (A650 -

Page 36: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

A620) (Sarni-Manchado et al., 2000). Anthocyanin content was expressed in units g-1 fresh

weight (FW), where one unit is defined as a hundredfold change in OD.

We calculated the expression levels of the structural genes encoding later steps in the

anthocyanin pathway, DFR, LDOX, BZ1, and UGT75C, and the regulators bHLH, MYB, and

WD40, in flowers and outer seed coats and represented relative expression levels as

log2-transformed FPKM values in heat maps (Figure 6, Table S17 and S18).

Presumptive punicalagin metabolism pathway

Gallic acid is the common precursor for EA and punicalagin biosynthesis. Gallic acid

biosynthesis has been partially elucidated in microbes and plants, where quinate

dehydrogenase (EC 1.1.1.24), 3-dehydroquinate dehydratase (EC 4.2.1.10), shikimate

dehydrogenase (EC 1.1.1.25), dehydroshikimate dehydrogenase (EC 4.2.1.118),

dehydroxylase (1.17.98.x), and methyltransferase (EC 2.1.1.-) are known to participate in the

pathway. As a derivative of gallic acid and EA, the metabolism of punicalagin might also

involve carboxylesterase (EC 3.1.1.1), glycosyltransferase (EC 2.4.1.x), acyltransferases (EC

2.3.1.x), and glucohydrolase (EC 3.2.1.21), according to the molecular structure of each

substrate and the likely required characteristics of each participating enzyme (Figure 7).

In the human intestine, punicalagins are hydrolyzed to EA under physiological

conditions and EA is then gradually metabolized by the intestinal microbiota into different

types of urolithins (Cerda et al., 2004). This process may include lactone-ring cleavage,

decarboxylation, and dehydroxylation reactions (Selma et al., 2009). Tannin acetylhydrolase

(EC 3.1.1.20), dehydroxylase (EC 1.17.98.x), and gallate decarboxylase (EC 4.1.1.59) likely

participate in the transformation of EA to urolithins (Figure 7).

Page 37: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Identification and analysis of candidate genes putatively involved in punicalagin

metabolism

To survey for genes encoding enzymes putatively involved in punicalagin metabolism,

the pomegranate gene set were aligned to KO entries using blastx with E-value ≤1e-20 and

identity ≥60%. For candidate enzymes, the function of the best hit was assigned to each

homologous pomegranate gene and their expression was analyzed in the transcriptomes of the

12 tissues mentioned above in Plant materials.

For methyltransferase (EC 2.1.1.-), the functions of the best hits KO15064 and

KO15066 were assigned to each gene, and then genes containing the domains IPR006222,

IPR013977, and IPR027266 were extracted and considered as candidate genes.

For glycosyltransferase (EC 2.4.1.x), 1174 genes obtained by alignment to KO entries

and 22 reference genes encoding A (AtUGT79B1, AtUGT91A1), B (AtUGT89B1), C

(AtUGT90A1), F (AtUGT78D1), G (AtUGT85A1), H (AtUGT76B1),

I (AtUGT83A1), J (AtUGT87A1), K (AtUGT86A1), L (AtUGT75B1, AtUGT84A1,

AtUGT74B1), M (AtUGT92A1), N (AtUGT82A1), O (AtUGT73B1), P (GRMZM5G834303),

Q (GRMZM2G082037, GRMZM2G037545), and R (Pilosella officinarum UGT95A1) group

glycosyltransferases were used to establish a phylogenetic tree. Genes clustered into the L

group were considered as candidate glycosyltransferase genes (Figure S17). For

acyltransferases (EC 2.3.1.x), eight genes containing IPR023213 and IPR003480, the

consensus domains of the V clade acyltransferases, and nine reference genes encoding

members of the I (NtMAT1, Pf3AT), II (CER2, Glossy2), III (SAAT), IV (ACT, SalAT), and V

(NtBEBT, AtHCT) clades of acyltransferases were used to establish a phylogenetic tree.

Genes clustered into clade V were considered as candidate acyltransferase (Figure S18).

Page 38: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Log2-transformed FPKM values shown as a heat map indicate the gene expressions (Figure 7

and Table S19).

Genes encoding tannin acetylhydrolase (EC3.1.1.20) and dehydroxylase (EC 1.17.98.x)

from Lactobacillus plantarum, Lactobacillus paraplantarum, Clostridium scindens, and

Debaryomyces hansenii were downloaded from NCBI. Coding sequences of the IPR016040,

IPR002347, and IPR020904 71 domains were used to extract dehydroxylase (EC 1.17.98.x)

sequences from the pomegranate genome. Coding sequences of the IPR029058 and

IPR011118 domains were used to extract tannin acetylhydrolase (EC3.1.1.20) sequences

from the pomegranate genome. Multiple alignments of the protein sequences were then

conducted using MUSCLE (Figure S19).

ACCESSION NUMBERS

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under

the accession MTKT00000000. The version described in this paper is version

MTKT01000000. Raw sequence data of the transcriptomes have been deposited into the

Sequence Read Archive (SRA) under accession SRP100581.

ACKNOWLEDGEMENTS

This project is supported by funding from the Agriculture Research System of Anhui

Province (AHNYCYTX-10), and the Key Laboratory of Genetic Improvement and

Ecophysiology of Horticultural crop, Anhui Province.

Page 39: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Authors’ contributions

Y.X., G.Q., R.M., and Z.Y. conceived the project. G.Q., Z.Y., and C.X wrote the

manuscript. R.M. and E.K. revised the manuscript. G.Q., Y.X., R.M., Z.Y., and C.X designed

the analyses. Z.G. and H.P. prepared the samples. C.X., Y.H., and Y.T. contributed to

sequencing, sequence assembly, genome annotation, and genome structure analyses. Y.Q.

contributed to flow cytometry analysis. G.Q., C.X, X,Y., Y.Q., Z.G, X.X., Y.H., and H.P.

contributed to the transcript analysis. H.T and Y.C contributed to the evolutionary analyses.

R.G. contributed to the TE analysis. E.K. contributed to MADS-box analysis. J.J., R.M.,

C.X., G.Q., and Z.Y. contributed to genetic mapping and chromosome anchoring. G.Q.,

R.M., Y.X., and C.X. contributed to the pathway analysis.

CONFLICT OF INTEREST

The authors declare that they have no conflict of interest.

SUPPORTING INFORMATION

Supplementary note. Additional information about detailed methods and results.

Supplementary Figures

Figure S1 Chromosomes in root tip cells of pomegranate (2n = 2x = 18).

Figure S2 Estimation of pomegranate genome size k-mer analysis.

Figure S3 Flow cytometric analysis of DNA content.

Figure S4 The nine linkage groups of the pomegranate genome.

Figure S5 Timing of LTR retrotransposon insertions in pomegranate.

Page 40: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Figure S6 Diagram showing the functional enrichment of GO (Gene Ontology) terms in

pomegranate orphan gene families.

Figure S7 Venn diagram showing orthologous groups shared among pomegranate, grape,

tomato, E. grandis, and A. thaliana.

Figure S8 Distribution of 4dTv distances between duplicated genes of syntenic regions in

pomegranate (green), Eucalyptus (red), and grape (blue).

Figure S9 Dot plots of pairwise syntenic blocks in pomegranate and grape and Eucalyptus.

Figure S10 Syntenic depths in comparisons of the pomegranate genome to the genomes of

grape and Eucalyptus.

Figure S11 The genomic distribution of pomegranate R genes.

Figure S12 Phylogenetic tree of MADS-box genes in pomegranate.

Figure S13 Phylogenetic tree of the YABBY gene family in pomegranate.

Figure S14 Multiple sequence alignment and analysis of positive selection in YABBY genes

in pomegranate.

Figure S15 Multiple sequence alignment of ANT genes and positive selection analysis of

BEL1 genes.

Figure S16 Phylogenetic tree of the TPS gene family in pomegranate.

Figure S17 Phylogenetic tree of glycosyltransferase (EC: 2.4.1.x) genes from pomegranate

and other species.

Figure S18 Phylogenetic tree of acyltransferases (EC: 2.3.1.x) genes from pomegranate and

other plant species.

FigureS19 Multiple sequence alignment dehydroxylase (EC 1.17.98.x) genes between

pomegranate and C. scindens.

Figure S20 Statistics of RNA-Seq support for the final gene set.

Page 41: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Figure S21 Phylogenetic tree constructed with Phase 1 sites of orthologous genes using

MrBayes.

Figure S22 Estimation of divergence time for pomegranate.

Figure S23 Distribution of Ks in the pomegranate gene set.

Figure S24 Enrichment of KEGG pathway functions in pomegranate genes during γ-WGD

events.

Figure S25 Enrichment of KEGG pathway functions in pomegranate genes during

lineage-specific M-WGD events.

Figure S26 Enrichment of KEGG pathway functions in significantly expanded gene families

in the pomegranate genome.

Figure S27 Duplication, anchoring, and expression of MADS-box genes.

Supplementary Tables

Table S1 Statistics for libraries with different insert sizes used in genome assembly.

Table S2 Comparison of the pomegranate genome to other sequenced plant genomes. Table

S3 Evaluation of the genome assembly with unigenes.

Table S4 Statistics regarding the genetic map of the F1 population from the cross between

pomegranate cultivars ‘Dabenzi’ and ‘Baiyushizi’.

Table S5 Comparison of final anchored genetic maps of pomegranate, pear, jujube, and mei.

Table S6 Length statistics of classified transposable elements identified de novo.

Table S7 Statistics regarding gene annotation of pomegranate.

Table S8 Functional annotation of the pomegranate gene set.

Table S9 Gene families clustered using OrthoMCL in 12 species.

Page 42: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Table S10 Significantly enriched KEGG pathways among orphan gene families (Q value <

0.01) in pomegranate.

Table S11 Significantly enriched KEGG pathways among genes in significantly expanded

pomegranate gene families.

Table S12 Resistance (R) genes in pomegranate, Arabidopsis, Eucalyptus, and grape

genome.

Table S13 Protein kinases in the pomegranate, Arabidopsis, Eucalyptus, and grape genomes.

Table S14 TFs in pomegranate, Arabidopsis, Eucalyptus, and grape genomes.

Table S15 Expression of genes involved in hemicellulose, cellulose, and lignin metabolism

in pomegranate.

Table S16 Expression of genes involved in integument development of pomegranate.

Table S17 The expression of genes encoding DFR, LDOX, BZ1 and UGT75C in

pomegranate.

Table S18 Expression of members of the bHLH, MYB, and WD40 gene families in

pomegranate.

Table S19 Expression of candidate genes for punicalagin metabolism in pomegranate.

Table S20 Plant materials used for genome sequencing and RNA-Seq.

Table S21 Statistics of the pomegranate genome assembly.

Table S22 Statistics of repetitive DNA sequences in the pomegranate genome.

Table S23 Statistics of TE content of the assembled pomegranate genome.

Page 43: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Table S24 Significantly enriched KEGG pathway terms among pomegranate genes affected

by two WGD events and one small-scale gene duplication event.

Table S25 Significantly enriched GO (Gene Ontology) terms among pomegranate genes

affected by two WGD events and one small-scale gene duplication event.

Table S26 Expression of pomegranate MADS-box genes in 10 RNA-Seq samples.

Table S27 Hardness of seeds at different developmental stages and among different cultivars.

REFERENCES

Adams, L.S., Zhang, Y., Seeram, N.P., Heber, D. and Chen, S. (2010) Pomegranate ellagitannin-derived

compounds exhibit antiproliferative and antiaromatase activity in breast cancer cells in vitro. Cancer

Prev. Res. (Phila), 3, 108-113.

Angiosperm Phylogeny Group. (2016) An update of the Angiosperm Phylogeny Group classification for

the orders and families of flowering plants: APG IV. Botanical J. Linn. Soc, 181, 1-20.

Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the flowering plant

Arabidopsis thaliana. Nature, 408, 796-815.

Argout, X., Salse, J., Aury, J.M., et al. (2011) The genome of Theobroma cacao. Nat. Genet., 43, 101-108.

Ashburner, M., Ball, C.A., Blake, J.A., et al. (2000) Gene ontology: tool for the unification of biology.

The Gene Ontology Consortium. Nat. Genet., 25, 25-29.

Attwood, T.K., Beck, M.E., Bleasby, A.J. and Parry-Smith, D.J. (1994) PRINTS--a database of protein

motif fingerprints. Nucleic Acids Res., 22, 3590-3596.

Baird, N.A., Etter, P.D., Atwood, T.S., Currey, M.C., Shiver, A.L., Lewis, Z.A., Selker, E.U., Cresko,

W.A. and Johnson, E.A. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD

markers. PLoS One, 3, e3376.

Batish, D.R., Singh, H.P., Kohli, R.K. and Kaur, S. (2008) Eucalyptus essential oil as a natural pesticide.

Forest Ecol. Manag., 256, 2166-2174.

Page 44: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Benderoth, M., Textor, S., Windsor, A.J., Mitchell-Olds, T., Gershenzon, J. and Kroymann, J. (2006)

Positive selection driving diversification in plant secondary metabolism. Proc. Natl. Acad. Sci. U S A.,

103, 9118-9123.

Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 27,

573-580.

Birney, E. and Durbin, R. (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res.,

10, 547-548.

Bretschneider E. (1935). History of European botanical discoveries in China. Hamburg: K.F. Koehlers

Antiquarium, pp. 1-57.

Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S. and Kahn, D. (2005) The ProDom database

of protein domain families: more emphasis on 3D. Nucleic Acids Res., 33, D212-D215.

Cerda, B., Espin, J.C., Parra, S., Martinez, P. and Tomas-Barberan, F.A. (2004) The potent in vitro

antioxidant ellagitannins from pomegranate juice are metabolised into bioavailable but poor antioxidant

hydroxy-6H-dibenzopyran-6-one derivatives by the colonic microflora of healthy humans. Eur. J. Nutr.,

43, 205-220.

Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M. and Robles, M. (2005) Blast2GO: a

universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics,

21, 3674-3676.

Cucinotta, M., Colombo, L. and Roig-Villanova, I. (2014) Ovule development, a new model for lateral

organ formation. Front Plant Sci., 5, 1-12.

D'Auria, J.C. (2006) Acyltransferases in plants: a good time to be BAHD. Curr. Opin. Plant Biol., 9,

331-340.

De Bie, T., Cristianini, N., Demuth, J.P. and Hahn, M.W. (2006) CAFE: a computational tool for the

study of gene family evolution. Bioinformatics, 22, 1269-1271.

Debeaujon, I., Leon-Kloosterziel, K.M. and Koornneef, M. (2000) Influence of the testa on seed

dormancy, germination, and longevity in Arabidopsis. Plant Physiol., 122, 403-414.

Denoeud, F., Carretero-Paulet, L., Dereeper, A., et al. (2014) The coffee genome provides insight into the

convergent evolution of caffeine biosynthesis. Science, 345, 1181-1184.

Page 45: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Nucleic Acids Res., 32, 1792-1797.

Edgar, R.C. and Myers, E.W. (2005) PILER: identification and classification of genomic repeats.

Bioinformatics, 21, i152-i158.

Endress, P.K. (2011a) Angiosperm ovules: diversity, development, evolution. Ann.. Bot., 107, 1465-1489.

Endress, P.K. (2011b) Evolutionary diversification of the flowers in angiosperms. Am. J. Bot.,

98,370-396.

Espin, J.C., Larrosa, M., Garcia-Conesa, M.T. and Tomas-Barberan, F. (2013) Biological significance

of urolithins, the gut microbial ellagic acid-derived metabolites: the evidence so far. Evid. Based

Complement. Alternat. Med., 2013, 270-418.

Fawole, O.A. and Opara, U.L. (2013a) Changes in physical properties, chemical and elemental

composition and antioxidant capacity of pomegranate (cv. Ruby) fruit at five maturity stages. Sci.

Hortic., 150, 37-46.

Fawole, O.A. and Opara, U.L. (2013b) Fruit growth dynamics, respiration rate and physico-textural

properties during pomegranate development and ripening. Sci. Hortic., 157: 90-98.

Fischer U A, Carle R, Kammerer D R. (2011) Identification and quantification of phenolic compounds

from pomegranate (Punica granatum, L.) peel, mesocarp, aril and differently produced juices by

HPLC-DAD–ESI/MSn. Food Chem.,127:807-821.

Flutre, T., Duprat, E., Feuillet, C. and Quesneville, H. (2011) Considering transposable element

diversification in de novo annotation approaches. PLoS One, 6, e16526.

Gershenzon, J. and Dudareva, N. (2007) The function of terpene natural products in the natural world.

Nat. Chem. Biol., 3, 408-414.

Gil, M.I., Tomas-Barberan, F.A., Hess-Pierce, B., Holcroft, D.M. and Kader, A.A. (2000) Antioxidant

activity of pomegranate juice and its relationship with phenolic composition and processing. J. Agric.

Food Chem., 48, 4581-4589.

Goff, S.A., Ricke, D., Lan, T.H., et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp.

japonica). Science, 296, 92-100.

Page 46: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Gonzalez, A., Mendenhall, J., Huo, Y. and Lloyd, A. (2009) TTG1 complex MYBs, MYB5 and TT2,

control outer seed coat differentiation. Dev. Biol., 325, 412-421.

Gonzalez, A., Zhao, M., Leavitt, J.M. and Lloyd, A.M. (2008) Regulation of the anthocyanin biosynthetic

pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J., 53,

814-827.

Goodstein, D.M., Shu, S., Howson, R., et al. (2012) Phytozome: a comparative platform for green plant

genomics. Nucleic Acids Res., 40, D1178-D1186.

Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001) Assignment of homology to genome

sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol.

Biol., 313, 903-919.

Grattapaglia, D. and Sederoff, R. (1994) Genetic linkage maps of Eucalyptus grandis and Eucalyptus

urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics, 137, 1121-1137.

Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W. and Gascuel, O. (2010) New

algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of

PhyML 3.0. Syst. Biol., 59, 307-321.

Hasegawa, M., Kishino, H. and Yano, T. (1985) Dating of the human-ape splitting by a molecular clock of

mitochondrial DNA. J. Mol. Evol., 22, 160-174.

Henning, S.M., Zhang, Y., Rontoyanni, V.G., Huang, J., Lee, R.P., Trang, A., Nuernberger, G. and

Heber, D. (2014) Variability in the antioxidant activity of dietary supplements from pomegranate, milk

thistle, green tea, grape seed, goji, and acai: effects of in vitro digestion. J. Agric. Food Chem., 62,

4313-4321.

Holland, D., Hatib, K. and Bar-Ya’akov, I. (2009) Pomegranate botany, horticulture, breeding. Hortic.

Rev. ( Am. Soc. Hortic. Sci.), 35, 127-191.

Hu, W.J., Harding, S.A., Lung, J., Popko, J.L., Ralph, J., Stokke, D.D., Tsai, C.J. and Chiang, V.L.

(1999) Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees.

Nat. Biotechnol, 17, 808-812.

Huang, S., Li, R., Zhang, Z., et al. (2009) The genome of the cucumber, Cucumis sativus L. Nat. Genet.,

41, 1275-1281.

Page 47: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Huelsenbeck, J.P. and Ronquist, F. (2001) MRBAYES: Bayesian inference of phylogenetic trees.

Bioinformatics, 17, 754-755.

Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M.

and Sigrist, C.J. (2006) The PROSITE database. Nucleic Acids Res., 34, D227-D230.

Jaillon, O., Aury, J.M., Noel, B., et al. (2007) The grapevine genome sequence suggests ancestral

hexaploidization in major angiosperm phyla. Nature, 449, 463-467.

Janick, J. (2007) Fruits of the Bibles. HortScience, 42, 1072-1076.

Jiao, Y., Leebens-Mack, J., Ayyampalayam, S., et al. (2012) A genome triplication associated with early

diversification of the core eudicots. Genome Biol., 13, 1-14.

Jin, J., Zhang, H., Kong, L., Gao, G. and Luo, J. (2014) PlantTFDB 3.0: a portal for the functional and

evolutionary study of plant transcription factors. Nucleic Acids Res., 42, 1182-1187.

Johanningsmeier, S.D. and Harris, G.K. (2011) Pomegranate as a functional food and nutraceutical

source. Annu. Rev. Food Sci. Technol., 2, 181-201.

Jones, P., Binns, D., Chang, H.Y., et al. (2014) InterProScan 5: genome-scale protein function

classification. Bioinformatics, 30, 1236-1240.

Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O. and Walichiewicz, J. (2005)

Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic Genome Res., 110, 462-467.

Kanehisa, M., Araki, M., Goto, S., et al. (2008) KEGG for linking genomes to life and the environment.

Nucleic Acids Res., 36, D480-D484.

Kanehisa, M. and Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res.,

28, 27-30.

Kasimsetty, S.G., Bialonska, D., Reddy, M.K., Ma, G., Khan, S.I. and Ferreira, D. (2010) Colon cancer

chemopreventive activities of pomegranate ellagitannins and urolithins. J. Agric. Food Chem., 58,

2180-2187.

Kent, W.J. (2002) BLAT--the BLAST-like alignment tool. Genome Res., 12, 656-664.

Kielbasa, S.M., Wan, R., Sato, K., Horton, P. and Frith, M.C. (2011) Adaptive seeds tame genomic

sequence comparison. Genome Res., 21, 487-493.

Kim, N.D., Mehta, R., Yu, W., et al. (2002) Chemopreventive and adjuvant therapeutic potential of

pomegranate (Punica granatum) for human breast cancer. Breast Cancer Res. Treat., 71, 203-217.

Page 48: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Kirshinbaum, L.M. (2012) Identification of aroma-active compounds in ‘Wonderful’ pomegranate fruit

using solvent-assisted flavour evaporation and headspace solid-phase micro-extraction methods. Eur.

Food Res.Technol., 235,277-283.

Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S.L.

(2004) Versatile and open software for comparing large genomes. Genome Biol., 5, R12.

Lam, S.D., Dawson, N.L., Das, S., Sillitoe, I., Ashford, P., Lee, D., Lehtinen, S., Orengo, C.A. and Lees,

J.G. (2016) Gene3D: expanding the utility of domain assignments. Nucleic Acids Res., 44, D404-409.

Landete, J.M. (2011) Ellagitannins, ellagic acid and their derived metabolites: A review about source,

metabolism, functions and health. Food Res. Int., 44, 1150–1160.

Langley, P. (2000) Why a pomegranate? BMJ, 321, 1153-1154.

Larkin, M.A., Blackshields, G., Brown, N.P., et al. (2007) Clustal W and Clustal X version 2.0.

Bioinformatics, 23, 2947-2948.

Lehti-Shiu, M.D. and Shiu, S.H. (2012) Diversity, classification and function of the plant protein kinase

superfamily. Philos. Trans. R. Soc. Lond. B Biol. Sci., 367, 2619-2639.

Levin, G.M. (2006) Pomegranate Roads: A Soviet Botanist's Exile from Eden. Forestville, CA: Floreat

Press, pp.37-38.

Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003) OrthoMCL: identification of ortholog groups for

eukaryotic genomes. Genome Res., 13, 2178-2189.

Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K. and Wang, J. (2009) SOAP2: an improved

ultrafast tool for short read alignment. Bioinformatics, 25, 1966-1967.

Li, S. (1998) Compendium of materia medica essence. Beijing: Science Press, pp.416-417.

Li, Y., Zheng, L., Corke, F., Smith, C. and Bevan, M.W. (2008) Control of final seed and organ size by

the DA1 gene family in Arabidopsis thaliana. Genes. Dev., 22, 1331-1336.

Lim, E.K., Doucet, C.J., Li, Y., Elias, L., Worrall, D., Spencer, S.P., Ross, J. and Bowles, D.J. (2002)

The activity of Arabidopsis glycosyltransferases toward salicylic acid, 4-hydroxybenzoic acid, and other

benzoates. J. Biol. Chem., 277, 586-592.

Lisch, D. (2013) How important are transposons for plant evolution? Nat Rev Genet, 14, 49-61.

Page 49: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Liu, M.J., Zhao, J., Cai, Q.L., et al. (2014) The complex jujube genome provides insights into fruit tree

biology. Nat. Commun., 5, 5315.

Longtin, R. (2003) The pomegranate: nature's power fruit? J Natl Cancer Inst, 95, 346-348.

Luo, R., Liu, B., Xie, Y., et al. (2012) SOAPdenovo2: an empirically improved memory-efficient

short-read de novo assembler. Gigascience, 1, 18.

Magallon, S., Gomez-Acevedo, S., Sanchez-Reyes, L.L. and Hernandez-Hernandez, T. (2015) A

metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. New Phytol.,

207, 437-453.

Majoros, W.H., Pertea, M. and Salzberg, S.L. (2004) TigrScan and GlimmerHMM: two open source ab

initio eukaryotic gene-finders. Bioinformatics, 20, 2878-2879.

Martin, D.M., Aubourg, S., Schouwey, M.B., Daviet, L., Schalk, M., Toub, O., Lund, S.T. and

Bohlmann, J. (2010) Functional annotation, genome organization and phylogeny of the grapevine (Vitis

vinifera) terpene synthase gene family based on genome assembly, FLcDNA cloning, and enzyme

assays. BMC Plant Biol., 10, 226.

Martin, W., Deusch, O., Stawski, N., Grunheit, N. and Goremykin, V. (2005) Chloroplast genome

phylogenetics: why we need independent approaches to plant molecular evolution. Trends Plant Sci., 10,

203-209.

Masiero, S., Colombo, L., Grini, P.E., Schnittger, A. and Kater, M.M. (2011) The emerging importance

of type I MADS box transcription factors for plant reproduction. Plant Cell, 23, 865-872.

McAbee, J.M., Kuzoff, R.K. and Gasser, C.S. (2005) Mechanisms of derived unitegmy among Impatiens

species. Plant Cell, 17, 1674-1684.

McCarthy, E.M. and McDonald, J.F. (2003) LTR_STRUC: a novel search and identification program for

LTR retrotransposons. Bioinformatics, 19, 362-367.

McColl, R. W. (1991). China's Silk Roads: A modern journey to China's western regions. Focus, 41, 1-6.

Mcdonnell, A.V., Jiang, T., Keating, A.E. and Berger, B. (2006) Paircoil2: improved

prediction of coiled coils from sequence. Bioinformatics, 22, 356-358.

Meister, R.J., Oldenhof, H., Bowman, J.L. and Gasser, C.S. (2005) Multiple protein regions contribute to

differential activities of YABBY proteins in reproductive development. Plant Physiol., 137, 651-662.

Page 50: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Meister, R.J., Williams, L.A., Monfared, M.M., Gallagher, T.L., Kraft, E.A., Nelson, C.G. and

Gasser, C.S. (2004) Definition and interactions of a positive regulatory element of the Arabidopsis

INNER NO OUTER promoter. Plant J., 37, 426-438.

Melgarejo, P., Calin-Sanchez, A., Vazquez-Araujo, L., Hernandez, F., Martinez, J.J., Legua, P. and

Carbonell-Barrachina, A.A. (2011) Volatile composition of pomegranates from 9 Spanish cultivars

using headspace solid phase microextraction. J. Food Sci., 76, S114-120.

Ming, R., Hou, S., Feng, Y., et al. (2008) The draft genome of the transgenic tropical fruit tree papaya

(Carica papaya Linnaeus). Nature, 452, 991-996.

Mishra, N.C. and Gold, B. (2006) Synthesis of 13C- and 14C‐labeled ellagic acid. J. Labelled Comp., 28,

927-941.

Mistry, J. and Finn, R. (2007) Pfam: a domain-centric method for analyzing proteins and proteomes.

Methods Mol. Biol., 396, 43-58.

Morton, J.F. and Dowling, C.F. (1987) Fruits of warm climates. Miami, Florida: Florida Flair Books, pp.

352-355.

Mount, D.W. (2007) Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc, 2007, pdb

top17.

Murray, M.G. and Thompson, W.F. (1980) Rapid isolation of high molecular weight plant DNA. Nucleic

Acids Res., 8, 4321-4325.

Myburg, A.A., Grattapaglia, D., Tuskan, G.A., et al. (2014) The genome of Eucalyptus grandis. Nature,

510, 356-362.

Nallely N.j., Paulina N., Sandra M.p., Francisca H., Ángel A. C.B, Aneta W. (2015) Identification and

quantification of major derivatives of ellagic acid and antioxidant properties of thinning and ripe

Spanish pomegranates. J. Funct. Foods, 12,354-364.

Nei, M. and Gojobori, T. (1986) Simple methods for estimating the numbers of synonymous and

nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3, 418-426.

Ossipov, V., Ossipova, S., Salminen, J.-P., Haukioja, E. and Pihlaja, K. (2003) Gallic acid and

hydrolysable tannins are formed in birch leaves from an intermediate compound of the shikimate

pathway. Biochem. Syst. Ecol., 31, 3-16.

Page 51: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Parashar, A. (2011) Pomegranate juice is potentially better than orange juice in improving antioxidant

function in elderly subjects. Pharma Res., 1, 14-23.

Parenicova, L., de Folter, S., Kieffer, M., et al. (2003) Molecular and phylogenetic analyses of the

complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world.

Plant Cell, 15, 1538-1551.

Pedruzzi, I., Rivoire, C., Auchincloss, A.H., et al. (2015) HAMAP in 2015: updates to the protein family

classification and annotation system. Nucleic Acids Res., 43, D1064-1070.

Pelletier, M.K., Burbulis, I.E. and Winkel-Shirley, B. (1999) Disruption of specific flavonoid genes

enhances the accumulation of flavonoid enzymes and end-products in Arabidopsis seedlings. Plant Mol.

Biol., 40, 45-54.

Perez-Rodriguez, P., Riano-Pachon, D.M., Correa, L.G., Rensing, S.A., Kersten, B. and

Mueller-Roeber, B. (2010) PlnTFDB: updated content and new features of the plant transcription factor

database. Nucleic Acids Res., 38, D822-827.

Perez, J., Munoz-Dorado, J., de la Rubia, T. and Martinez, J. (2002) Biodegradation and biological

treatments of cellulose, hemicellulose and lignin: an overview. Int. Microbiol., 5, 53-63.

Prasad, K. and Ambrose, B.A. (2010) Shaping up the fruit: control of fruit size by an Arabidopsis B-sister

MADS-box gene. Plant Signal. Behav., 5, 899-902.

Prasad, K., Zhang, X., Tobon, E. and Ambrose, B.A. (2010) The Arabidopsis B-sister MADS-box

protein, GORDITA, represses fruit growth and contributes to integument development. Plant J., 62,

203-214.

Price, A.L., Jones, N.C. and Pevzner, P.A. (2005) De novo identification of repeat families in large

genomes. Bioinformatics, 21 Suppl 1, i351-i358.

Quesneville, H., Bergman, C.M., Andrieu, O., Autard, D., Nouaud, D., Ashburner, M. and

Anxolabehere, D. (2005) Combined evidence annotation of transposable elements in genome

sequences. PLoS Comput. Biol., 1, 166-175.

Robert, X. and Gouet, P. (2014) Deciphering key features in protein structures with the new ENDscript

server. Nucleic Acids Res., 42, W320-324.

Ryu, D., Mouchiroud, L., Andreux, P.A., et al. (2016) Urolithin A induces mitophagy and prolongs

lifespan in C. elegans and increases muscle function in rodents. Nat. Med., 22, 879-888.

Page 52: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Sanchez-Lamar, A., Fonseca, G., Fuentes, J.L., et al. (2008) Assessment of the genotoxic risk of Punica

granatum L. (Punicaceae) whole fruit extracts. J. Ethnopharmacol., 115, 416-422.

SanMiguel, P., Gaut, B.S., Tikhonov, A., Nakajima, Y. and Bennetzen, J.L. (1998) The paleontology of

intergene retrotransposons of maize. Nat. Genet., 20, 43-45.

Sanseverino, W., Roma, G., De Simone, M., Faino, L., Melito, S., Stupka, E., Frusciante, L. and

Ercolano, M.R. (2010) PRGdb: a bioinformatics platform for plant resistance gene analysis. Nucleic

Acids Res., 38, D814-D821.

Sarni-Manchado, P., Le Roux, E., Le Guerneve, C., Lozano, Y. and Cheynier, V. (2000) Phenolic

composition of litchi fruit pericarp. J. Agric. Food Chem., 48, 5995-6002.

Schultz, J., Milpetz, F., Bork, P. and Ponting, C.P. (1998) SMART, a simple modular architecture

research tool: identification of signaling domains. Proc. Natl. Acad. Sci. U S A, 95, 5857-5864.

Seeram, N.P., Lee, R. and Heber, D. (2004) Bioavailability of ellagic acid in human plasma after

consumption of ellagitannins from pomegranate (Punica granatum L.) juice. Clin. Chim. Acta, 348,

63-68.

Seeram, N.P., Schulman, R.N. and Heber, D. (2006) Pomegranates ancient roots to modern medicine.

Boca Raton, Florial: CRC Press, pp. 63-183.

Selma, M.V., Beltran, D., Garcia-Villalba, R., Espin, J.C. and Tomas-Barberan, F.A. (2014)

Description of urolithin production capacity from ellagic acid of two human intestinal Gordonibacter

species. Food Funct., 5, 1779-1784.

Selma, M.V., Espin, J.C. and Tomas-Barberan, F.A. (2009) Interaction between phenolics and gut

microbiota: role in human health. J. Agric. Food Chem., 57, 6485-6501.

Sieber, P., Gheyselinck, J., Gross-Hardt, R., Laux, T., Grossniklaus, U. and Schneitz, K. (2004) Pattern

formation during early ovule development in Arabidopsis thaliana. Dev. Biol., 273, 321-334.

Siegfried, K.R., Eshed, Y., Baum, S.F., Otsuga, D., Drews, G.N. and Bowman, J.L. (1999) Members of

the YABBY gene family specify abaxial cell fate in Arabidopsis. Development, 126, 4117-4128.

Singh, S.A. and Christendat, D. (2006) Structure of Arabidopsis dehydroquinate dehydratase-shikimate

dehydrogenase and implications for metabolic channeling in the shikimate pathway. Biochem., 45,

7787-7796.

Page 53: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Skinner, D.J., Hill, T.A. and Gasser, C.S. (2004) Regulation of ovule development. Plant Cell, 16 Suppl,

S32-S45.

Smaczniak, C., Immink, R.G., Angenent, G.C. and Kaufmann, K. (2012) Developmental and

evolutionary diversity of plant MADS-domain factors: insights from recent studies. Development, 139,

3081-3098.

Soltis D. E., Smith S. A., Cellinese N., et al. (2011) Angiosperm phylogeny: 17 genes, 640 taxa. Am. J.

Bot., 98, 704-730.

Sreekumar, S., Sithul, H., Muraleedharan, P., Azeez, J.M. and Sreeharshan, S. (2014) Pomegranate

fruit as a rich source of biologically active compounds. Biomed. Res. Int., 2014, 1-12.

Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S. and Morgenstern, B. (2006) AUGUSTUS: ab

initio prediction of alternative transcripts. Nucleic Acids Res., 34, W435-W439.

Suyama, M., Torrents, D. and Bork, P. (2006) PAL2NAL: robust conversion of protein sequence

alignments into the corresponding codon alignments. Nucleic Acids Res., 34, W609-W612.

Tarailo-Graovac, M. and Chen, N. (2009) Using RepeatMasker to identify repetitive elements in genomic

sequences. Curr. Protoc. Bioinformatics, 4.10, 1-14.

The Tomato Genome Consortium. (2012) The tomato genome sequence provides insights into fleshy fruit

evolution. Nature, 485, 635-641.

Thouet, J., Quinet, M., Lutts, S., Kinet, J.M. and Perilleux, C. (2012) Repression of floral meristem fate

is crucial in shaping tomato inflorescence. PLoS One, 7, e31096.

Toriba, T., Harada, K., Takamura, A., Nakamura, H., Ichikawa, H., Suzaki, T. and Hirano, H.Y.

(2007) Molecular characterization the YABBY gene family in Oryza sativa and expression analysis of

OsYABBY1. Mol. Genet. Genomics, 277, 457-468.

Trapnell, C., Pachter, L. and Salzberg, S.L. (2009) TopHat: discovering splice junctions with RNA-Seq.

Bioinformatics, 25, 1105-1111.

Tuskan, G.A., Difazio, S., Jansson, S., et al. (2006) The genome of black cottonwood, Populus

trichocarpa (Torr. & Gray). Science, 313, 1596-1604.

Page 54: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Tzulker, R., Glazer, I., Bar-Ilan, I., Holland, D., Aviram, M. and Amir, R. (2007) Antioxidant activity,

polyphenol content, and related compounds in different fruit juices and homogenates prepared from 29

different pomegranate accessions. J. Agric. Food Chem., 55, 9559-9570.

Ulrike A. Fischer, R.C., and Dietmar R. K. (2011) Identification and quantification of phenolic

compounds from pomegranate (Punica granatum L.) peel, mesocarp, aril and differently

produced juices by HPLC-DAD–ESI/MSn. Food Chem., 127, 807-821.

Usta, C., Ozdemir, S., Schiariti, M. and Puddu, P.E. (2013) The pharmacological use of ellagic acid-rich

pomegranate fruit. Int. J. Food Sci. Nutr., 64, 907-913.

Velasco, R., Zharkikh, A., Affourtit, J., et al. (2010) The genome of the domesticated apple (Malus x

domestica Borkh.). Nat. Genet., 42, 833-839.

Verde, I., Abbott, A.G., Scalabrin, S., et al. (2013) The high-quality draft genome of peach (Prunus

persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet.,

45, 487-494.

Villanueva, J.M., Broadhvest, J., Hauser, B.A., Meister, R.J., Schneitz, K. and Gasser, C.S. (1999)

INNER NO OUTER regulates abaxial- adaxial patterning in Arabidopsis ovules. Genes Dev., 13,

3160-3169.

Viuda-Martos, M. Femández-López, J. Pérez-Álvarez,J.A. (2010) Pomegranate and its many functional

components as related to human health: a review. Compr Rev. Food Sci. Food Saf., 9, 635-654.

Wikstrom, N., Savolainen, V. and Chase, M.W. (2001) Evolution of the angiosperms: calibrating the

family tree. Proc. Biol. Sci., 268, 2211-2220.

Wink, M. (2003) Evolution of secondary metabolites from an ecological and molecular phylogenetic

perspective. Phytochem., 64:3-19.

Winkel-Shirley, B. (2001) Flavonoid biosynthesis. a colorful model for genetics, biochemistry, cell

biology, and biotechnology. Plant Physiol., 126, 485-493.

Page 55: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

Xia, T., Li, N., Dumenil, J., Li, J., Kamenski, A., Bevan, M.W., Gao, F. and Li, Y. (2013) The ubiquitin

receptor DA1 interacts with the E3 ubiquitin ligase DA2 to regulate seed and organ size in Arabidopsis.

Plant Cell, 25, 3347-3359.

Xu, Z. and Wang, H. (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR

retrotransposons. Nucleic Acids Res., 35, W265-W268.

Yan, W., Chen, D. and Kaufmann, K. (2016) Molecular mechanisms of floral organ specification by

MADS domain proteins. Curr. Opin. Plant Biol., 29, 154-162.

Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol., 24, 1586-1591.

Zhao, X., Yuan, Z., Feng, L. and Fang, Y. (2015) Cloning and expression of anthocyanin biosynthetic

genes in red and white pomegranate. J. Plant Res., 128, 687-696.

FIGURE LEGENDS

Figure 1 Characterization of the pomegranate genome. The innermost circle represents

ideograms of the nine pomegranate pseudo-chromosomes and the synteny relationship of

gene blocks between pseudo-chromosomes. All density information from the inner to the

outer circle was counted in non-overlapping 200-kb windows. a, LINE repeat; b, Gypsy

repeat; c, DNA repeat; d, Copia repeat; e, unknown repeat; f, exon; g, GC content.

Figure 2 Phylogenetic tree and expansion and contraction of gene families. The phylogenetic

tree was constructed from a concatenated alignment of 509 families of single-copy genes

from 12 higher plant species. MRCA is the most recent common ancestor. Gene family

expansions are indicated in red, and gene family contractions are indicated in green; the

corresponding proportions among total changes are shown using the same colors in the pie

charts. Blue portions of the pie charts represent the conserved gene families. Inferred

divergence dates (in millions of years) are indicated in black at each node. Circles in gray

Page 56: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

represent WGD events (γ and M). The numbers of gene families, the numbers of genes within

each gene family, and the numbers of families of unique genes by species are shown on the

right.

Figure 3 Collinearity patterns between grape, peach, pomegranate, and Eucalyptus. (a)

Typical macro-collinearity patterns between genomic regions from grape, peach,

pomegranate, and Eucalyptus. The macro-colinearity pattern shows that a typical ancestral

region in the grape genome can be traced to one region in peach and up to two regions each

in pomegranate and Eucalyptus, highlighting the inferred Myrtales WGD. Grey wedges in the

background indicate syntenic blocks spanning more than 30 genes between the genomes; one

of them is highlighted in red. (b) Micro-collinearity pattern between grape, peach,

pomegranate, and Eucalyptus. Rectangles represent predicted gene models with colors

showing relative orientations (blue: same strand, green: opposite strand). Matching gene pairs

are displayed as connected boxes. Gray wedges connect matching gene pairs; one of them is

highlighted in red.

Figure 4 Expression patterns of genes involved in seed coat development. (a) Candidate

genes potentially involved in lignin, cellulose, and hemicellulose synthesis; (b) Candidate

genes potentially involved in lignin, cellulose, and hemicellulose degradation. RNAs isolated

from inner and outer seed coats of ‘Dabenzi’ at 50, 95, and 140 DAP, inner seed coat of

‘Baiyushizi’ and ‘Tunisi’ at 50 DAP designated as DB-ISC50, DB-ISC95, DB-ISC140,

DB-OSC50, DB-OSC95, DB-OSC140, BY-ISC50, and T-ISC50, respectively, were

sequenced to survey expression of genes potentially involved in lignin, cellulose, and

hemicellulose synthesis and degradation in pomegranate. Enzyme numbers (EC) correspond

Page 57: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

to the following enzymes: EC 4.3.1.24, phenylalanine ammonia-lyase; EC 1.14.13.11,

trans-cinnamate 4-monooxygenase; EC 6.2.1.12, 4-coumarate--CoA ligase; EC 1.2.1.44,

Cinnamoyl-CoA reductase; EC 1.1.1.195, Cinnamyl-alcohol dehydrogenase; EC 1.11.1.7,

Peroxidase; EC 1.14.13.36, p-coumarate 3-hydroxylase; EC 2.1.1.68, Caffeic acid

3-O-methyltransferase; EC 1.14.-.-, ferulate-5-hydroxylase; EC 2.4.1.12, Cellulose synthase

(CesA,UDP-forming); EC2.4.1.32, beta-mannan synthase; EC 2.4.2.24, 1,4-beta-D-xylan

synthase; EC 1.10.3.2, laccase; EC 2.4.1.43, polygalacturonate

4-alpha-galacturonosyltransferase; EC 2.4.2.39, xyloglucan 6-xylosyltransferase; EC 3.2.1.4,

endoglucanase; EC 3.2.1.24, alpha-mannosidase; EC 3.2.1.6, endo-1,3-beta-D-glucanase; EC

3.2.1.21, beta-glucosidase/beta-xylosidase; EC 3.2.1.55, alpha-N-arabinofuranosidase; EC

3.2.1.78, endo-1,4-beta-mannosidase; EC 3.2.1.114, alpha-mannosidase II; EC 3.2.1.113,

mannosidase 1A; EC 3.2.1.37, xylan 1,4-beta-xylosidase/beta-D-xylosidase; EC 3.2.1.8,

Endo-1,4-beta-xylanase; EC 3.2.1.177, alpha-xylosidase; EC 2.4.1.207, xyloglucan

endotransglucosylase. Detailed gene expression data are provided in Table S15. (c) Candidate

genes potentially involved in integument development. RNAs isolated from flower, inner and

outer seed coats at 50, 95, and 140 DAP of ‘Dabenzi’ designated as DB-F, DB-ISC50,

DB-ISC95, DB-ISC140, DB-OSC50, DB-OSC95, and DB-OSC140, respectively, were used

to survey the expression patterns of genes potentially involved in inner and outer integument

development. The expression of genes that could function to promote development is

indicated in red, and the expression of genes that could function to repress integument

development is indicated in blue. Detailed gene expression data for genes potentially

Page 58: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

involved in integument development are provided in Table S16. Gene expression is

represented as Log2FPKM (fragments per kilobase of exon per million fragments mapped).

Figure 5 Phylogenetic tree of the YABBY gene family and analysis of selection evolution on

INO genes. (a) Phylogenetic tree of the YABBY gene family. The tree was constructed using

the maximum likelihood method and 1000 bootstrap iterations. Bootstrap values supported by

≥50% are shown. Red indicates gene family members from pomegranate cultivar ‘Dabenzi’;

blue indicates gene family members from A. thaliana. (b) Analysis of selection evolution on

INO genes. The site that was subject to selection is indicated in yellow at the 35th amino acid

residue. The sequencing depth for each amino acid is shown in bp above the corresponding

amino acid site. The green arrow is the initial position of the domain. The 16 species used for

this analysis include: Ath, Arabidopsis thaliana; Bna, Brassica napus; Bol, Brassica

oleracea; Bra, Brassica rapa; Car, Cicer arietinum; Cme, Cucumis melo; Egr, Eucalyptus

grandis; Fve, Fragaria vesca; Gma, Glycine max; Mdo, Malus domestica; Pgr, Punica

granatum ‘Dabenzi’; Pmu, Prunus mume; Rco, Ricinus communis; Sin, Sesamum indicum;

Vvi, Vitis vinifera; and Zju, Ziziphus jujuba.

Figure 6 Schematic of pomegranate anthocyanin metabolism and expression profiles of

candidate genes involved in anthocyanin metabolism. RNAs isolated from flowers and outer

seed coats at 50, 95, and 140 DAP of ‘Dabenzi’ indicated as DB-F, DB-OSC50, DB-OSC95,

and DB-OSC140, respectively, were used to survey the expression patterns of genes

including transcription factors that are likely involved in late steps of anthocyanin

metabolism. The numbers in boxes below the photographs of fruit denote total anthocyanin

Page 59: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

content (Unit·g-1 FW) in flowers and outer seed coats. Detailed expression data for genes

related to anthocyanin metabolism are provided in Tables S17 and 18.

Figure 7 Schematic of proposed punicalagin metabolism and expression patterns of candidate

genes encoding each enzyme. Solid black arrows indicate reaction pathways according to

KEGG; dotted black arrows indicate reaction pathways presumed according to the structures

of compounds. Arrows in black indicate that the reaction occurs only in plant cells; arrows in

red indicate that the reaction occurs only in the mammalian intestine; arrows in green indicate

that the reaction occurs in either plants or intestine. Enzyme numbers in black indicate that

there are genes in pomegranate that encode that enzyme; enzyme numbers in grey indicate

that there is no gene encoding that enzyme in pomegranate. Enzyme numbers (EC)

correspond to the following enzymes: EC 1.1.1.25, shikimate dehydrogenase; EC 1.1.1.282,

quinate/shikimate dehydrogenase; EC 1.1.1.24, quinate dehydrogenase; EC 4.2.1.10,

3-dehydroquinate dehydratase; EC 4.2.1.118, dehydroshikimate dehydrogenase; EC 2.1.1.-,

syringate/3-O-methylgallate O-demethylase; EC ? , an enzyme that catalyzes formation of a

dimeric derivative from two gallic acid molecules or from gallic acid and ellagic acid; EC

3.1.1.1, carboxylesterase; EC 2.4.1.x, glycosyltransferase; EC 2.3.1.x, O-galloyltransferase;

EC 3.1.1.21, β-D-glucoside glucohydrolase; EC 3.1.1.20, tannin acetylhydrolase;

EC 4.1.1.59, gallate decarboxylase; and EC 1.17.98.x, dehydroxylase. Samples along the

horizontal axes of heatmaps from left to right are DB-R, DB-L, DB-F, DB-ISC50,

DB-ISC95, DB-ISC140, DB-OSC50, DB-OSC95, DB-OSC140, DB-P50, DB-P95, and

DB-P140, represent root, leaf, flower, and inner and outer seed coats and pericarp at 50, 95,

and 140 DAP of ‘Dabenzi’. Detailed expression data for genes related to punicalagin

metabolism are provided in Table S19.

Page 60: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

Figure 1 Characterization of the pome

egranate genome.

Page 61: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

Figure 2 Phylogenetic tree and expan

nsion and contraction of gene families.

Page 62: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

(a)

(b)

Figure 3 Collinearity patterns between grape, peach, pomegranate and eucalyptus.

Page 63: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le(a)

(b)

Page 64: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

(c)

Figure 4 Expression patterns of genes

s involved in seed coat development.

Page 65: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

This article is protected by copyright. All rights reserved.

(a)

(b)

Figure 5 Phylogenetic tree of the YABBY gene family and selection analysis of INO genes.

Page 66: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

Figure 6 Schematic of pomegranate a

genes involved in anthocyanin metabo

anthocyanin metabolism and expression profiles of

olism.

f

Page 67: The pomegranate (Punica granatum L.) genome and the ...

Acc

epte

d A

rtic

le

Figure 7 Schematic of proposed punic

encoding each enzyme.

calagin metabolism and expression patterns of gen

nes