The draft genome of watermelon (Citrullus lanatus …...The draft genome of watermelon (Citrullus l...
Transcript of The draft genome of watermelon (Citrullus lanatus …...The draft genome of watermelon (Citrullus l...
The draft genome of watermelon (Citrullus lanatus)
and resequencing of 20 diverse accessions
Shaogui Guo1,2,17
, Jianguo Zhang3,4,17
, Honghe Sun1,2,5,17
, Jerome Salse6,17
, William J.
Lucas7,17
, Haiying Zhang1, Yi Zheng
2, Linyong Mao
2, Yi Ren
1, Zhiwen Wang
3, Jiumeng Min
3,
Xiaosen Guo3, Florent Murat
6, Byung-Kook Ham
7, Zhaoliang Zhang
7, Shan Gao
2, Mingyun
Huang2, Yimin Xu
2, Silin Zhong
2, Aureliano Bombarely
2, Lukas A. Mueller
2, Hong Zhao
1,
Hongju He1, Yan Zhang
1, Zhonghua Zhang
8, Sanwen Huang
8, Tao Tan
9, Erli Pang
9, Kui Lin
9,
Qun Hu10
, Hanhui Kuang10
, Peixiang Ni3,4
, Bo Wang3, Jingan Liu
1, Qinghe Kou
1, Wenju Hou
1,
Xiaohua Zou1, Jiao Jiang
1, Guoyi Gong
1, Kathrin Klee
11, Heiko Schoof
11, Ying Huang
3,
Xuesong Hu3, Shanshan Dong
3, Dequan Liang
3, Juan Wang
3, Kui Wu
3, Yang Xia
1, Xiang
Zhao3, Zequn Zheng
3, Miao Xing
3, Xinming Liang
3, Bangqing Huang
3, Tian Lv
3, Junyi
Wang3, Ye Yin
3, Hongping Yi
12, Ruiqiang Li
13, Mingzhu Wu
12, Amnon Levi
14, Xingping
Zhang1, James J. Giovannoni
2,15, Jun Wang
3,16, Yunfu Li
1, Zhangjun Fei
2,15 & Yong Xu
1
1National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and
Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops
(North China), Beijing, China.
2Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY, USA.
3BGI-Shenzhen, Chinese Ministry of Agriculture, Key Lab of Genomics, Shenzhen, China.
4T-Life Research Center, Fudan University, Shanghai, China.
5College of Plant Science and Technology, Beijing University of Agriculture, Beijing, China.
6INRA, UMR 1095, Genetics, Diversity and Ecophysiology of Cereals, F-63100
Clermont-Ferrand, France.
7Deptartment of Plant Biology, College of Biological Sciences, University of California,
Davis, CA, USA.
8Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing,
China.
9College of Life Sciences, Beijing Normal University, Beijing, China.
10College of Horticulture and Forestry, Huazhong Agriculture University, Wuhan, China.
11INRES Crop Bioinformatics, University of Bonn, Katzenburgweg 2, 53115 Bonn, Germany
Nature Genetics: doi:10.1038/ng.2470
12Xinjiang Academy of Agricultural Sciences, Urumqi, China.
13Beijing Novogene Bioinformation Technology Co. Ltd, Beijing, China
14USDA, ARS, U.S. Vegetable Lab, 2700 Savannah Highway, Charleston, SC, USA.
15USDA Robert W. Holley Center for Agriculture and Health, Tower Road, Ithaca, NY, USA.
16Department of Biology, University of Copenhagen, Copenhagen, Denmark.
17These authors contributed equally to this work.
Correspondence should be addressed to Yong Xu ([email protected]), Zhangjun Fei
([email protected]), Yunfu Li ([email protected]) or Jun Wang ([email protected]).
Nature Genetics: doi:10.1038/ng.2470
Table of contents
1. Supplementary Note................................................................................................... 1
S1 Genome sequencing, assembly and quality assessment .................................... 1
S1.1 Whole genome shot-gun sequencing using the Illumina technology ...... 1
S1.2 De novo assembly of the watermelon genome ........................................ 1
S1.3 Evaluation of the effect of sequence depth and large-insert reads on the
quality of genome assemblies .......................................................................... 1
S1.4 Unassembled genome evaluation ............................................................ 2
S1.5 Sequencing of BAC ends and full BAC clones ....................................... 3
S1.6 Evaluation of the quality of the assembled watermelon genome .......... 3
S1.6.1 Gene coverage ............................................................................ 3
S1.6.2 Genome coverage ....................................................................... 4
S1.6.3 Structural correctness of watermelon genome assembly ............ 4
S2 Genome annotation ............................................................................................ 5
S2.1 Repeat annotation .................................................................................... 5
S2.1.1 De novo identification of repeat sequences ................................ 5
S2.1.2 Employment of Repbase for repeat identification ...................... 6
S2.1.3 Classification of de novo TEs ..................................................... 6
S2.2 Functional annotation of watermelon genes ............................................ 7
S2.3 non-coding RNA (ncRNA) annotation .................................................... 7
S3 Watermelon chromosome evolution analysis .................................................... 7
S3.1 Dating of paralogous and ortholougous gene pairs ................................. 7
S4 Genome resequencing ........................................................................................ 8
S4.1 Validation of SNPs and small indels........................................................ 8
S4.2 Distribution of SNPs and small indels across the watermelon genome .. 8
S4.3 Phylogenetic relationship and population structure analyses .................. 8
S4.4 Selective sweep analysis .......................................................................... 9
S5 Disease resistance-related genes ........................................................................ 9
S5.1 Identification of disease resistance genes ................................................ 9
S5.2 Coverage of watermelon NBS-LRR genes by the genome assembly ... 10
S5.3 Watermelon NBS-LRR genes in semi-wild and wild accessions .......... 11
S6 Comparative analysis of cucurbit phloem sap and vascular transcriptomes ... 11
S6.1 Identification of phloem sap transcripts ................................................ 11
S6.2 Comparative analysis of phloem sap and vascular transcripts .............. 12
S7 Regulation of watermelon fruit development and quality ............................... 12
S7.1 Model of sugar accumulation in watermelon fruit flesh ........................ 12
S7.2 Identification and classification of transcription factors ....................... 14
S7.3 Identification of sucrose-controlled upstream open reading frame
(SC-uORF) containing bZIP transcription factors.......................................... 14
S7.4 MADS box genes in watermelon and cucumber genomes .................... 15
2. Supplementary References....................................................................................... 16
Nature Genetics: doi:10.1038/ng.2470
1
Supplementary Note
S1 Genome sequencing, assembly and quality assessment
S1.1 Whole genome shot-gun sequencing using the Illumina technology
Illumina short-insert paired-end (clone size: 100, 200, and 400 bp) and large-insert
mate-pair (2, 5, 10 and 20 kb) libraries were prepared following the manufacturer’s
instructions. For construction of mate-pair libraries, DNA circularization, digestion of
linear DNA, fragmentation of circularized DNA, and purification of biotinylated DNA
were performed prior to adapter ligation. The template DNA fragments of the
constructed libraries were hybridized to the surface of flow cells, amplified to form
clusters and then sequenced on the Illumina GAII system, based on the standard
Illumina protocol.
S1.2 De novo assembly of the watermelon genome
In order to achieve a high quality assembled genome, raw Illumina reads were
processed to remove low quality reads, adaptor sequences, and possible contaminated
reads of bacterial and viral origins. The resulting high-quality cleaned reads from
libraries with insert size ranging from 100 to 400 bp were assembled into contigs
using SOAPdenovo, a de Bruijn graph based assembly software1. Then paired-end
relationships from all PE library reads were used to join contigs into scaffolds. Finally,
we used entire short-insert reads (100-400 bp) to fill in gaps within scaffolds.
S1.3 Evaluation of the effect of sequence depth and large-insert reads on the
quality of genome assemblies
The sequence depth of reads from all short-insert paired-end libraries that we
generated provided 83.7X coverage of the watermelon genome. To investigate the
effect of sequence depth on the watermelon genome assembly, we randomly chose
different depth of data, 20X, 40X, 50X, 60X, 65X, 70X, 75X and 80X, from each lane,
and then assembled these data independently. From these studies, the sequence depth
approached saturation at 50X, based on the slow growth of the total scaffold length;
Nature Genetics: doi:10.1038/ng.2470
2
however, N50 scaffold size showed an obvious positive correlation with the sequence
depth (Supplementary Fig. 4a and 4b). We then investigated the effect of
large-insert reads on the watermelon genome assembly. Six independent assemblies
were performed using combinations of cleaned reads from libraries of different insert
sizes: 100-200 bp, 400 bp, 2 kb, 5 kb, 10 kb and 20 kb. The first assembly used only
reads from the 100-200 bp insert libraries; whereas the subsequent assemblies were
performed using reads from the next longer insert libraries combined with reads used
in the previous assembly. Our results indicated that, by including reads from
large-insert libraries, both N50 size (from approximately 25 kb to 2.4 Mb) and total
length (from approximately 290 Mb to 350 Mb) of the watermelon genome assembly
were significantly increased (Supplementary Fig. 4c and 4d). In summary, our
analysis revealed that higher-depth sequencing of the watermelon genome and
including reads from large-insert mate-pair libraries, substantially improved assembly
efficiency, indicating a positive impact on the cost-benefit ratio of high-quality
genome sequence assembly.
S1.4 Unassembled genome evaluation
Unassembled reads were obtained after mapping the cleaned reads to the assembled
watermelon genome using the SOAPaligner with default parameters. Approximately
17.4% of the cleaned reads could not be aligned to the assembly and were therefore
regarded as “unassembled”; the percentage of unassembled reads was largely
consistent with the estimated unassembled portion of the genome (approximately
16.8%; 76.5 Mb out of a total of 425 Mb).
Three lanes of Illumina runs with 75 bp reads, which are suitable for BLAST
searches, were randomly selected from independent DNA libraries to investigate the
properties of unassembled reads. The unassembled reads from these lanes were
re-aligned to the genome assembly using BLASTN with less stringent criteria
(E-value of 1e-10, word size of 20 and low complexity filtering turned off) than
SOAPaligner. The alignments were further filtered and only those with lengths and
identities larger than 60 bp and 80%, respectively, were retained (Supplementary
Nature Genetics: doi:10.1038/ng.2470
3
Table 3). The distribution of “unassembled” reads mapped by BLAST onto individual
chromosomes is shown in Supplementary Fig. 2. This analysis clearly showed that
the majority of these unassembled reads mapped to the centromeric and
pericentromeric regions of the chromosomes. Additionally, in these “unassembled”
regions with high read depths, we were able to determine three repeat units, each of
which shared similarity to sequences related to centromeres, telomeres or 45S rDNA
clusters. Furthermore, FISH analyses confirmed the existence of these three types of
repeats in the watermelon genome (Fig. 1).
S1.5 Sequencing of BAC ends and full BAC clones
Both ends of 1,152 randomly selected clones from the BAC library of watermelon
inbred line 97103 (ref. 2) were sequenced with an Applied Biosystems 3730xl DNA
Analyzer. We obtained a total of 1,529 high quality sequences for 862 BAC clones,
among which 667 clones had high quality sequences from both ends and 195 clones
had sequences only from one end. The sequences are publicly available at the
Cucurbit Genomics Database (http://www.icugi.org).
Four BAC clones, two located in gene-rich euchromatin regions and the other two
located in centromere highly repetitive regions, were fully sequenced with an Applied
Biosystems 3730xl DNA Analyzer. The sequences were deposited into GenBank
under accessions JN402338, JN402339, JX027061, and JX027062.
S1.6 Evaluation of the quality of the assembled watermelon genome
S1.6.1 Gene coverage
A total of 1,064,502 watermelon expressed sequence tags (ESTs) collected from
various sources including NCBI dbEST database
(http://www.ncbi.nlm.nih.gov/dbEST/) and the cucurbit genomics database
(http://www.icugi.org), were used to assess the gene coverage of the watermelon
genome assembly. ESTs were aligned to the genome assembly using BLAT3. Only
ESTs with alignments of identity ≥ 0.9 and coverage ≥ 0.5 were kept. The analysis
Nature Genetics: doi:10.1038/ng.2470
4
indicated that the genome assembly had a high coverage of gene coding regions
(~97%; Supplementary Table 4).
S1.6.2 Genome coverage
The four fully sequenced watermelon BACs were aligned to the genome assembly
using NCBI blast program with filter of low complexity regions set to off (-F F). Only
alignments with sequence identify ≥ 0.98 were kept. The genome assembly covered
97.8% and 97.6% of the two BACs (GenBank accessions: JN402338 and JN402339)
that were located in the gene rich euchromatin regions, respectively; whereas it only
covered 90.2% and 64.2% of the two BACs (GenBank accessions: JX027061 and
JX027062) that were located in centromere highly repetitive regions, respectively
(Supplementary Fig. 3). The low coverage of the genome assembly on the two
highly repetitive BAC sequences is not uncommon, especially for genomes generated
using next-generation sequencing (NGS) technologies. Nonetheless, high coverage in
gene-rich euchromatin regions and the fact that ~17% of the unassembled genome is
mainly repeat sequences, indicated by our analysis described above (S1.4), confirmed
the high quality of the watermelon genome assembly.
S1.6.3 Structural correctness of watermelon genome assembly
One common error in de novo genome assemblies is that two contigs are incorrectly
joined into one scaffold, resulting in local assembly errors. The alignments of the four
full BAC sequences did not identify any local assembly errors (Supplementary Fig.
3). To further study the structural correctness of the watermelon genome assembly, the
paired-end sequences of 667 BAC clones were aligned to the scaffolds using
BLASTN. In order to ensure unambiguous mapping, only sequences of at least 300 nt
that aligned to a unique location with a coverage of 95% or more and an identity of
99% or better were used. In total, 341 (51.1%) of the BAC end sequence pairs could
be aligned to a unique position on the scaffolds with these stringent criteria. Pairs of
end sequences that aligned to a single scaffold with incorrect orientation (i.e., both
end sequences aligned to the same strand), or at a too large distance from each other
Nature Genetics: doi:10.1038/ng.2470
5
(more than 200 kb) were considered indicators of potential assembly errors. Out of the
302 pairs aligned to same scaffolds, none were aligned inconsistently with the
genome assembly (Supplementary Table 5).
Despite the absence of local assembly errors, identified based on our analysis of
the relatively small scale of full BAC and BAC end sequences, our genome assembly
would inevitably contain a small portion of this kind of errors as this is common in all
genome assembly projects. Indeed, during our scaffold anchoring based on the
high-density genetic map, we did find two scaffolds with total size 5.76 Mb (1.6% of
the assembled genome) that were not consistent with the genetic map. However, at
this stage we could not determine whether the inconsistency is due to genome
assembly errors or to errors in the genetic map. Nonetheless, our analysis confirmed
the overall structural correctness of scaffolds of the watermelon genome assembly.
Furthermore, out of the 39 BAC end sequence pairs mapped to different scaffolds,
two were aligned to different chromosomes, indicating potential errors in scaffold
anchoring or errors caused by potential chimeric BAC clones (Supplementary Table
6).
In summary, our extensive analyses confirmed the high quality of the de novo
watermelon genome assembly.
S2 Genome annotation
S2.1 Repeat annotation
S2.1.1 De novo identification of repeat sequences
We first used PILER4 and RepeatScout
5 for repeat sequence identification in the
watermelon genome assembly. LTR retrotransposons were identified with
LTR_FINDER6 with default parameters. All repeat sequences with lengths >100 bp
and gap “N” less than 5% constituted the raw transposable element (TE) library.
Second, we used all-versus-all BLASTN (E-value ≤ 1e-10) to search against the raw
transposable element (TE) library, and sequences were filtered when two repeats
aligned with identity ≥ 80%, coverage ≥ 80% and minimal matching length ≥ 100 bp;
Nature Genetics: doi:10.1038/ng.2470
6
this yielded a non-redundant TE library. Next, all non-redundant repeats were
searched against the SwissProt protein database to filter out protein-coding genes by
BLASTX (E-value ≤ 1e-4, identity ≥ 30%, coverage ≥ 30% and the minimal matching
length ≥30 aa). After manual correction, a de novo TE library for the watermelon
genome was obtained. RepeatClassifier was then used to classify repeat models for
the de novo TE library to establish a final classified de novo TE library. Finally, we
used RepeatProteinMask and RepeatMasker (http://www.repeatmasker.org) with the
final classified de novo TE library to search the assembled genome to locate TE loci.
S2.1.2 Employment of Repbase for repeat identification
We used RepeatProteinMask, RepeatMasker (http://www.repeatmasker.org) and the
known repbase library (http://www.girinst.org/repbase/index.html) to find TE repeats
in the assembled genome. TEs were identified both at the DNA and protein level.
RepeatMasker was applied for DNA-level identification using a custom library (a
combination of Repbase, plant repeat database and our genome de novo TE library).
At the protein level, RepeatProteinMask was used to perform WU-BLASTX against
the TE protein database. Overlapping TEs belonging to the same type of repeats were
integrated, whereas those with low scores were removed if they overlapped > 80%
and belonged to different types.
S2.1.3 Classification of de novo TEs
A hierarchical system was used to classify de novo TEs. This system involved the
following steps: 1) BLASTN against Repbase; 2) BLASTX against TE proteins; 3)
BLASTN against plant/animal repeat databases; 4) BLASTX against SwissProt
proteins; 5) TBLASTX against Repbase; and 6) TBLASTX against plant/animal
repeat databases. For each step, TEs having significant hits with known repeats were
assigned a type either at the DNA level (E-value ≤ 1e-10, identity ≥ 80%, coverage
≥30% and the minimal matching length ≥ 80 bp) or at the protein level (E-value ≤1e-4,
identity ≥ 30%, coverage ≥ 30% and the minimal matching length ≥ 30 aa). LTR
Nature Genetics: doi:10.1038/ng.2470
7
retrotransposons identified by LTR_FINDER were classified as "unclassified LTR" if
they had no homology to known repeats.
In order to understand the evolution and bursts of LTR retrotransposons, we
aligned both ends of each pair of LTR retrotransposons. Then the divergence of the
LTR pairs was calculated using the distmat program implemented in the EMBOSS
package with the Kimura two-parameter model. Finally, the insertion time T was
calculated as T = K/r, with r as the rate of nucleotide substitution and K as the distance.
The molecular clock was set as 6.5 x 10-9
per site per year7.
S2.2 Functional annotation of watermelon genes
The predicted watermelon genes were compared to SwissProt, TrEMBL and
Arabidopsis protein databases8 using NCBI BLASTP (E-value ≤ 1e-4). Functional
domains of watermelon genes were identified by comparing their sequences against
protein databases including Pfam, PRINTS, PROSITE, ProDom, and SMART using
InterProScan9. Gene Ontology (GO) terms for each gene were obtained from the
corresponding InterPro entries. Based on the results from BLASTP and InterProScan,
and GO term information for each gene, functions of predicted watermelon genes
were assigned using the AHRD pipeline (Automated assignment of Human Readable
Descriptions) as described previously10
.
S2.3 Non-coding RNA (ncRNA) annotation
tRNA genes were identified by tRNAscan-SE11
with default parameters. The C/D box
snoRNAs were identified by Snoscan12
. Other ncRNAs, including miRNAs, snRNAs,
and H/ACA box snoRNAs were identified using INFERNAL software by searching
against the Rfam13
database with default parameters.
S3 Watermelon chromosome evolution analysis
S3.1 Dating of paralogous and orthologous gene pairs
We performed sequence divergence as well as speciation event dating analysis based
Nature Genetics: doi:10.1038/ng.2470
8
on the rate of nonsynonymous (Ka) vs. synonymous (Ks) substitutions, calculated
with MEGA5 (ref. 14). The average substitution rate (r) of 6.5 × 10-9
substitutions per
synonymous site per year was used to calibrate the age of the considered genes7. The
time (T) since gene insertion was then estimated using the formula T = Ks/r (ref.
15,16).
S4 Genome resequencing
S4.1 Validation of SNPs and small indels
To confirm the SNP and small indel calling, a total of 75 non-overlapping genome
regions (~50 kb) were randomly selected in each of the 20 watermelon accessions,
PCR amplified, and sequenced with an Applied Biosystems 3730xl DNA Analyzer.
The resulting sequences were first processed to remove low quality regions (phred
quality score < 30) and then aligned to the reference 97103 genome by BWA17
.
S4.2 Distribution of SNPs and small indels across the watermelon genome
The majority of the 6.8 million SNPs (88.9%) we identified were located in intergenic
regions, whereas only 2.9% were in coding regions. The ratio of nonsynonymous
(97,933) to synonymous (94,225) substitutions was 1.04, which is higher than that of
Arabidopsis18
(0.83) but lower than that of soybean19
(1.61) and rice20
(1.29). Of the
965,006 indels, 88.3% were located in intergenic regions, whereas only 0.57% (5,531)
were located in coding regions.
S4.3 Phylogenetic relationship and population structure analyses
The neighbor-joining tree contained four major groups, corresponding to the
cultivated C. lanatus subsp. vulgaris East-Asia ecotype and America ecotype, C.
lanatus subsp. mucosospermus and C. lanatus subsp. lanatus (Fig. 3a). PCA indicated
that the C. lanatus subsp. lanatus group was clearly separated from other groups using
the first and second eigenvectors (Fig. 3b). The wide dispersal of the C. lanatus subsp.
lanatus group indicated its higher diversity. In our samples, C. lanatus subsp. vulgaris
Nature Genetics: doi:10.1038/ng.2470
9
was very closely related to C. lanatus subsp. mucosospermus. The two cultivated
subgroups, C. lanatus subsp. vulgaris East-Asia ecotype and America ecotype, are
nearly indistinguishable, supporting the low level of genetic diversity associated with
cultivated watermelon.
Additional analysis of population structure was performed using the FRAPPE
program21
. Here, we analyzed the data by increasing K (the number of populations)
from 2 to 5 (Fig. 3c). For K = 2, we identified a division between C. lanatus subsp.
lanatus and the other 17 accessions. Based on K = 3, the 21 watermelon accessions
were clearly divided into three groups, including C. lanatus subsp. lanatus, C. lanatus
subsp. vulgaris and C. lanatus subsp. mucosospermus. Using K = 4, C. lanatus subsp.
vulgaris was divided into two subgroups, East-Asia ecotype and America ecotype,
and for K = 5, a new subgroup within the C. lanatus subsp. mucosospermus group
emerged.
S4.3 Selective sweep analysis
In addition to potential selective sweeps, we also identified regions with lowest (top
1%) πmucosospermus/πvulgaris. In contrast to selective sweeps, these regions represent those
with significantly higher levels of polymorphisms in cultivated watermelon C. lanatus
subsp. vulgaris compared with C. lanatus subsp. mucosospermus, thus they can serve
as a negative control of selective sweeps. A total of 95 regions of 7.19 Mb in size,
containing 477 genes, were identified (Supplementary Table 17). GO term analysis
indicated that only several biological processes were enriched in those 477 genes
when compared to the whole genome and none of them were associated with known
selected traits (Supplementary Table 18). As expected, this is in contrast to genes in
potential selective sweeps that were highly enriched with biological processes related
to important selected traits (Supplementary Table 16).
S5 Disease resistance-related genes
S5.1 Identification of disease resistance genes
Nature Genetics: doi:10.1038/ng.2470
10
Genes encoding nucleotide-binding site (NBS) and leucine-rich repeat (LRR)
domains were identified following a two-step process. First, watermelon protein
sequences were screened against the Pfam database for the presence of the NBS
(NB-ARC) family domain (PF00931)22
using the HMMER3 program
(http://hmmer.janelia.org). Second, conserved motifs were then derived from the
domain profiles retrieved from the Pfam and SMART
(http://smart.emblheildelberg.de) databases and from the COILS Server23
with a
probability ≥ 90% to detect CC domains specifically.
Lipoxygenase (LOX) proteins were identified by comparing watermelon protein
sequences to the InterPro database to search for the lipoxygenase domain (IPR001024
or IPR000907).
To identify LRR-containing receptor-like proteins (RLPs) and receptor-like
kinases (RLKs), the TMHMM Server (http://www.cbs.dtu.dk/services/TMHMM) was
used to search for trans-membrane domains in watermelon protein sequences. Then,
the sequences were compared to the pfam domain database to search for the LRR
domain (PF00560) and protein kinase domain (PF00069).
S5.2 Coverage of watermelon NBS-LRR genes by the genome assembly
The watermelon genome assembly contains considerably fewer NBS-LRR genes (44)
than other plant species such as rice, apple and maize. In this study we sought to
check whether this low number of identified NBS-LRR genes is due to the incomplete
coverage by the genome assembly of genes from this family. We first blasted the 44
NBS-LRR protein sequences to a watermelon EST dataset with a low stringency
cutoff (e value < 1e-5) to identify potential NBS-LRR genes in this EST collection,
which contains ~75K unigenes assembled from ~600K EST sequences, with the
majority being generated using the 454 sequencing technology (http://www.icugi.org;
ref24). From this analysis, we obtained a total of 27 unigenes, among which 23 were
covered by the NBS-LRR genes. Alignments of these 23 unigenes to the NBS-LRR
genes indicated that they all have at least 99% identity if we removed homopolymer
errors in the unigene sequences. Detailed examination of the remaining four unigenes
Nature Genetics: doi:10.1038/ng.2470
11
indicated that they are all covered by the watermelon genome assembly, with three
(WMU79003, WMU36300 and WMU43890) covered by predicted genes (Cla019850,
Cla019073 and Cla015037, respectively). These three genes (two Leucine Rich
Repeat family proteins and one LRR receptor-like serine/threonine-protein kinase) all
lack the typical NB-ARC domain found in R proteins. Examination of the
corresponding genomic region of the unigene (WMU10848) that does not correspond
to any predicted genes indicated that the genomic region contains no open reading
frames (ORFs), suggesting it is probably a pseudogene. The fact that all 23 NBS-LRR
genes identified in the EST collection are covered by the genome assembly indicated
that the chance is very low of NBS-LRR genes being not covered by the genome
assembly.
S5.3 Watermelon NBS-LRR genes in semi-wild and wild accessions
We checked the presence of the 44 NBS-LRR genes in the genomes of the
semi-wild/wild accessions by aligning the genome resequencing reads to these 44
NBS-LRR genes. We found that all the 44 NBS-LRR genes are present in semi-wild
C. lanatus subsp. mucosospermus accessions and only one gene, Cla012424, is absent
in PI296341-FR, PI482276, and PI482303, all belonging to the wild C. lanatus subsp.
lanatus, but present in PI482326, another C. lanatus subsp. lanatus accession.
S6 Comparative analysis of cucurbit phloem sap and vascular
transcriptomes
S6.1 Identification of phloem sap transcripts
The watermelon and cucumber vascular bundle transcriptomes represent those
mRNAs being expressed in the cambium, the companion cells as well as in the
phloem and xylem parenchyma. The phloem transcriptomes represent mRNAs that
are contained in the phloem sap collected from these plants. This sap represents the
phloem translocation stream that is carried by the enucleate sieve tube system. Thus,
the phloem sap transcriptome represents a unique population of transcripts present
Nature Genetics: doi:10.1038/ng.2470
12
within mature enucleate sieve elements that are generated by their nucleate
companion cells. For our studies, the phloem transcriptomes contain only those
transcripts that were found to be enriched with at least two-fold higher levels in the
phloem sap compared to the vascular tissues. Excluding transcripts that were lower
than this 2-fold enrichment removes those that, potentially, could have contaminated
the phloem sap from the surrounding tissues. In the present study, we detected ~1000
transcripts in the phloem sap of cucumber and watermelon. This mRNA population is
about 10 times smaller than that for the complex vascular tissues, and this is to be
expected as the mature sieve elements form a highly specialized conduit for nutrient
delivery.
S6.2 Comparative analysis of phloem sap and vascular transcripts
We compared vascular and phloem sap transcripts between watermelon and cucumber,
respectively, using BLATSP with an E-value cutoff of 1e-5. We also compared the
whole gene sets between watermelon and cucumber (Supplementary Table 27). At
all E-value cutoffs tested, vascular transcripts always had significantly more pairs
between watermelon and cucumber than was found for the comparison of the whole
gene sets for these two cucurbits. A converse situation existed in terms of phloem sap
transcripts for watermelon and cucumber; here, there were significantly less pairs than
for the whole gene sets (p values of all chi-square tests were <0.0001). This analysis
indicated that transcripts in vascular bundles were highly conserved between
watermelon and cucumber, whereas those in the phloem translocation stream
(sampled by the sap) were highly divergent.
S7 Regulation of watermelon fruit development and quality
S7.1 Model of sugar accumulation in watermelon fruit flesh
Through strand-specific RNA-seq analysis, we identified a total of 13 sugar metabolic
enzyme coding genes that were differentially expressed during flesh development and
also between flesh and rind tissues (at least two-fold differential expression and
Nature Genetics: doi:10.1038/ng.2470
13
FDR<0.01). These 13 genes were distributed in seven enzyme categories
(Supplementary Table 39). AGA (α-galactosidase) and IAI (insoluble acid invertase)
are known to determine plant sink strength by regulating photosynthate unloading into
fruits25-27
. The up-regulation of an AGA gene, Cla006123, and an IAI gene, Cla020872,
during watermelon flesh development, and their significantly higher expression levels
in high-sugar fruit flesh than in low-sugar rind indicated their important roles in
regulating the unloading and utilization of the translocated RFOs (raffinose family of
oligosaccharides). The substances of RFOs, sucrose and galactose, are metabolized
and utilized for further energy metabolism in cytoplasm via a complicated enzyme
mediated network. A vacuole SAI (soluble acid invertase) gene, Cla002328, was
found to be down-regulated accompanying with the sugar accumulation in fruit flesh.
A significant negative correlation has been observed between the SAI activity and
sucrose accumulation in melon28
, tomato29
and sugarcane30
. The decreased expression
of Cla002328 will reduce the sucrose catabolism rate and lead to the high
concentration sucrose accumulation in vacuole, the main organelle storing sugar in
watermelon fruit flesh. A UGGP (UDP-galactose/glucose pyrophosphorylase) gene,
Cla013902, was up-regulated during watermelon flesh development, indicating its key
role in fruit sink metabolism, based on the function of UGGP in catalyzing the
synthesis of UDP-Glucose/UDP-Galactose reported in melon31
. Finally, the
differentially expressed NI (neutral invertase) gene, Cla021809, SPS (sucrose
phosphate synthase) gene, Cla011923 and UGE (UDP-glucose 4-epimerase) genes,
Cla009857 and Cla012809, can contribute to the fruit sucrose catabolism and
utilization32
, maintaining sucrose metabolism cycle33
and providing substances for
cell wall biosynthesis and growth34
during watermelon fruit flesh development.
Sugar transporters are necessary for sugar transmembrane transportation and
partitioning35
. A total of 14 sugar transporter genes were found to be differentially
expressed accompanying with sugar accumulation in flesh tissue and between
high-sugar fruit flesh and low-sugar rind (Supplementary Table 40). We suppose that
they play important roles in sugar accumulation in the fruit flesh of watermelon, same
as in fruit of tomato36
and grapevine37
.
Nature Genetics: doi:10.1038/ng.2470
14
In summary, our model demonstrated a novel genomic insight into the complex
gene network involved in sugar unloading, metabolism and partitioning during sugar
accumulation in watermelon fruit flesh.
S7.2 Identification and classification of transcription factors
Transcription factors (TFs) were identified and classified into different families using
the iTAK program (http://bioinfo.bti.cornell.edu/tool/itak). The program first
compared watermelon protein sequences against the pfam domain database22
using
the HMMER3 program (http://hmmer.janelia.org). Proteins containing corresponding
DNA-binding domains were identified as TFs and further classified into different
families, based on the rules described in Perez-Rodriguez38
. The same pipeline was
also applied to other plant genomes, including cucumber, Arabidopsis, rice, grape,
poplar, papaya, sorghum, soybean, Brachypodium, maize, apple, cacao, strawberry,
and castor bean. TFs identified from these plant genomes are available at
http://bioinfo.bti.cornell.edu/cgi-bin/itak/db_home.cgi.
Within the watermelon genome, we identified a total of 1,448 putative
transcription factor (TF) genes, distributed in 59 families. The number of identified
TFs is among the lowest in the sequenced plant genomes, though comparable to 1,412
and 1,407 for cucumber and grape, respectively (Supplementary Table 41).
S7.3 Identification of sucrose-controlled upstream open reading frame
(SC-uORF) containing bZIP transcription factors
To identify SC-uORF containing bZIP transcription factors in the watermelon genome,
we first extracted 2 kb sequences that are upstream of translation start sites (ATG) of
each of the 23,440 predicted protein coding genes. These 2 kb upstream sequences
were then translated into protein sequences using the transeq program in the
EMBOSS package (http://emboss.sourceforge.net), with standard codon usage table
and in three forward frames. The resultant peptide sequences were scanned for the
presence of the conserved SC-uORF motif ([I/L/F][L/M/V/S][H/Q/L][S][F][S][V][V]
Nature Genetics: doi:10.1038/ng.2470
15
[F/Y][L][Y][W/Y][F/T/L][Y][N/V][I/F/V][S]) as reported in previous studies39,40
. Peptide
sequences containing the SC-uORF motif were further manually checked to confirm
that they have both start and stop codons flanking the matched sites. Finally, a total of
four SC-uORF motif contained bZIP transcription factors, Cla014247, Cla022469,
Cla014572 and Cla017361, were identified in the watermelon genome.
S7.4 MADS box genes in watermelon and cucumber genomes
One notable feature of both watermelon and cucumber genomes is that they contain
much fewer MADS-box transcription factors than most of the other sequenced plant
genomes (Supplementary Table 41). Protein sequences of watermelon, cucumber,
and Arabidopsis MADS-box genes, as well as tomato LeMADS-RIN and TAGL1, were
aligned using ClustalW (http://www.clustal.org). The Neighbor-joining phylogenetic
tree of MADS-box proteins was then constructed from the alignment with 1,000
bootstraps. The phylogenetic analysis identified two MADS family clades that appear
to be completely lost in both watermelon and cucumber genomes (Supplementary
Fig. 15). The first includes an FLC (FLOWERING LOCUS C) and five related MAF
(MADS AFFECTING FLOWERING) genes, which are negative regulators of floral
development41-43
. Absence of these genes in watermelon and cucumber genomes
implies that these two organisms may have different pathways for regulating floral
development, possibly related to the monoecious nature of their flowers. The second
clade that is absent from these two genomes contains a large group of 18 Type I
Arabidopsis MADS-box TFs, whose functions remain unclear although they are
reported to evolve and be lost more quickly during evolution41
.
Nature Genetics: doi:10.1038/ng.2470
16
Supplementary References
1. Li, R. et al. De novo assembly of human genomes with massively parallel short read
sequencing. Genome Res. 20, 265–272 (2010).
2. Joobeur, T. et al. Construction of a watermelon BAC library and identification of SSRs
anchored to melon or Arabidopsis genomes. Theor. Appl. Genet. 112, 1553–1562 (2006).
3. Kent, W.J. BLAT--the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
4. Edgar, R.C. & Myers, E.W. PILER: identification and classification of genomic repeats.
Bioinformatics 21, 152–158 (2005).
5. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large
genomes. Bioinformatics 21, 351–358 (2005).
6. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR
retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
7. Hu, T.T. et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size
change. Nat. Genet. 43, 476–481 (2011).
8. Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M. & Bairoch, A. UniProtKB/Swiss-Prot.
Methods Mol. Biol. 406, 89–112 (2007).
9. Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence classification
and comparison. Methods Mol. Biol 396, 59–70 (2007).
10. The Tomato Genome Sequencing Consortium. The tomato genome sequence provides insights
into fleshy fruit evolution. Nature 485, 635–641 (2012)
11. Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of transfer RNA
genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
12. Lowe, T.M. & Eddy, S.R. A computational screen for methylation guide snoRNAs in yeast.
Science 283, 1168–1171 (1999).
13. Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic
Acids Res. 33, 121–124 (2005).
14. Tamura, K. et al. MEGA5: Molecular Evolutionary Genetics Analysis using Maximum
Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol. Biol. Evol 28,
2731–2739 (2011).
15. Murat, F. et al. Ancestral grass karyotype reconstruction unravels new mechanisms of genome
shuffling as a source of plant evolution. Genome Res. 20, 1545–1557 (2010).
16. Salse, J. et al. Reconstruction of monocotelydoneous proto-chromosomes reveals faster
evolution in plants than in animals. Proc. Natl. Acad. Sci. USA 106, 14908–14913 (2009).
17. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform.
Bioinformatics 25, 1754–1760 (2009).
18. Clark, R.M. et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis
thaliana. Science 317, 338–342 (2007).
19. Lam, H.M. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns
of genetic diversity and selection. Nat. Genet. 42, 1053–1059 (2010).
20. Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers for
identifying agronomically important genes. Nat. Biotech. 30, 105–111 (2012).
21. Tang, H., Peng, J., Wang, P. & Risch, N.J. Estimation of individual admixture: analytical and
study design considerations. Genet. Epidemiol. 28, 289–301 (2005).
Nature Genetics: doi:10.1038/ng.2470
17
22. Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 38, 211–222 (2010).
23. Lupas, A., Van Dyke, M. & Stock, J. Predicting coiled coils from protein sequences. Science
252, 1162–1164 (1991).
24. Guo, S. et al. Characterization of transcriptome dynamics during watermelon fruit
development: sequencing, assembly, annotation and gene expression profiles. BMC Genomics
12, 454 (2011).
25. Godt, D.E. & Roitsch, T. Regulation and tissue-specific distribution of mRNAs for three
extracellular invertase isoenzymes of tomato suggests an important function in establishing
and maintaining sink metabolism. Plant Physiol. 115, 273–282 (1997).
26. Gao, Z. & Schaffer, A.A. A novel alkaline alpha-galactosidase from melon fruit with a
substrate preference for raffinose. Plant Physiol. 119, 979–988 (1999).
27. Carmi, N. et al. Cloning and functional expression of alkaline alpha-galactosidase from melon
fruit: similarity to plant SIP proteins uncovers a novel family of plant glycosyl hydrolases.
Plant J. 33, 97–106 (2003).
28. Schaffer, A.A., Aloni, B. & Fogelman, E. Sucrose metabolism and accumulation in
developing fruit of Cucumis. Phytochemistry 26, 1883–1887 (1987).
29. Yelle, S., Chetelat, R.T., Dorais, M., Deverna, J.W. & Bennett, A.B. Sink Metabolism in
tomato fruit: IV. genetic and biochemical analysis of sucrose accumulation. Plant Physiol. 95,
1026–1035 (1991).
30. Zhu, Y.J., Komor, E. & Moore, P.H. Sucrose accumulation in the sugarcane stem is regulated
by the difference between the activities of soluble acid invertase and sucrose phosphate
synthase. Plant Physiol. 115, 609–616 (1997).
31. Dai, N. et al. Cloning and expression analysis of a UDP-galactose/glucose pyrophosphorylase
from melon fruit provides evidence for the major metabolic pathway of galactose metabolism
in raffinose oligosaccharide metabolizing plants. Plant Physiol. 142, 294–304 (2006).
32. Roitsch, T. & González, M.-C. Function and regulation of plant invertases: sweet sensations.
Trends Plant Sci. 9, 606–613 (2004).
33. Nguyen-Quoc, B. & Foyer, C.H. A role for 'futile cycles' involving invertase and sucrose
synthase in sucrose metabolism of tomato fruit. J. Exp. Bot. 52, 881–889 (2001).
34. Rosti, J. et al. UDP-glucose 4-epimerase isoforms UGE2 and UGE4 cooperate in providing
UDP-galactose for cell wall biosynthesis and growth of Arabidopsis thaliana. Plant Cell 19,
1565–1579 (2007).
35. Slewinski, T.L. Diverse Functional roles of monosaccharide transporters and their homologs
in vascular plants: a physiological perspective. Mol. Plant 4, 641–662 (2011).
36. Milner, I.D., Ho, L.C. & Hall, J.L. Properties of proton and sugar transport at the tonoplast of
tomato (Lycopersicon esculentum) fruit. Physiol. Plant 94, 399–410 (1995).
37. Afoufa-Bastien, D. et al. The Vitis vinifera sugar transporter gene family: phylogenetic
overview and macroarray expression profiling. BMC Plant Biology 10, 245 (2010).
38. Perez-Rodriguez, P. et al. PlnTFDB: updated content and new features of the plant
transcription factor database. Nucleic Acids Res. 38, 822–827 (2010).
39. Wiese, A., Elzinga, N., Wobbes, B. & Smeekens, S. A conserved upstream open reading frame
mediates sucrose-induced repression of translation. Plant Cell 16, 1717–1729 (2004).
40. Thalor, S.K. et al. Deregulation of sucrose-controlled translation of a bZIP-type transcription
factor results in sucrose accumulation in leaves. PLoS ONE 7, e33111 (2012).
Nature Genetics: doi:10.1038/ng.2470
18
41. Michaels, S.D. & Amasino, R.M. FLOWERING LOCUS C encodes a novel MADS domain
protein that acts as a repressor of flowering. Plant Cell 11, 949–956 (1999).
42. Ratcliffe, O.J., Nadzan, G.C., Reuber, T.L. & Riechmann, J.L. Regulation of flowering in
Arabidopsis by an FLC homologue. Plant Physiol. 126, 122–132 (2001).
43. Ratcliffe, O.J., Kumimoto, R.W., Wong, B.J. & Riechmann, J.L. Analysis of the Arabidopsis
MADS AFFECTING FLOWERING gene family: MAF2 prevents vernalization by short
periods of cold. Plant Cell 15, 1159–1169 (2003).
Nature Genetics: doi:10.1038/ng.2470
19
Supplementary Figures
Supplementary Figure 1. 17-mer depth distribution of the Illumina GA reads. Reads from
libraries with clone insert sizes of 200 bp were used for analysis. A total of 4,639,223,061
17-mers were obtained, and the peak depth was 11. Watermelon genome size was estimated
based on the formula: Genome size = (Total number of kmer)/(Position of peak depth) =
4,639,223,061 / 11 = 421.75 Mb
Nu
mb
er
of
17
-mer
s
Depth0 10 20 30 40 50
30 x 106
25 x 106
20 x 106
15 x 106
10 x 106
5 x 106
0
Nature Genetics: doi:10.1038/ng.2470
20
Supplementary Figure 2. Distribution of unassembled reads on watermelon chromosomes.
The color scale bar represents densities of the corresponding elements. TEs: transposable
elements; Unassembled: unassembled reads.
Nature Genetics: doi:10.1038/ng.2470
21
JN402338
JN402339
JX027061
JX027062
Supplementary Figure 3. Genome coverage evaluated by four fully sequenced BACs
(GenBank accession numbers: JN402338, JN402339, JX027061 and JX027062).
Nature Genetics: doi:10.1038/ng.2470
22
a b
c d
Supplementary Figure 4. Effects of sequence depth and large-insert reads on watermelon
genome assembly. (a) Scaffold N50 size and (b) total assembled genome size patterns of
assemblies with reads representing different sequence depths. (c) Scaffold N50 size and (d)
total assembled genome size patterns of assemblies with reads of different insert sizes (see
Supplementary Note).
0
5
10
15
20
25
0 20 40 60 80 100
Scaff
old
N50 l
en
gth
(K
b)
Data depth (X)
Scaffold N50
272
274
276
278
280
282
284
286
288
290
292
0 20 40 60 80 100
To
tal le
ng
th (
Mb
)
Data depth (X)
Total length
0
500
1,000
1,500
2,000
2,500
3,000
0 1 2 3 4 5 6 7
N50 l
en
gth
(K
b)
Rank
Scaffold N50
0
50
100
150
200
250
300
350
400
0 1 2 3 4 5 6 7
To
tal le
ng
th (
Mb
)
Rank
Total length
Nature Genetics: doi:10.1038/ng.2470
23
Supplementary Figure 5. Distribution of divergence rate for each type of TEs in the
watermelon genome. The divergence rate was calculated between the identified TE elements in
the genome and the consensus sequence in the TE library.
0
0.2
0.4
0.6
0.8
1
0 10 20 30
sequence divergence rate (%)
Pe
rce
nta
ge
of
ge
no
me
(%
)LINE
DNA
LTR
SINE
Nature Genetics: doi:10.1038/ng.2470
24
Supplementary Figure 6. Distribution of TE insertion time of watermelon and cucumber.
MYA: million years ago.
watermelon cucumber
Nature Genetics: doi:10.1038/ng.2470
25
Supplementary Figure 7. Heat map of the watermelon genome component distribution in the
eleven chromosomes. RTs, retrotransposons; LTR_RT, long terminal repeat retrotransposon;
DNA-TEs, DNA transposons.
Nature Genetics: doi:10.1038/ng.2470
26
Supplementary Figure 8. rDNA pattern in watermelon genomes. Fluorescence in situ
hybridization (FISH) analyses were performed using 45S and 5S rDNAs as probes on genomes
of C. lanatus subsp. vulgaris (a), C. lanatus subsp. mucosospermus (b) and C. lanatus subsp.
lanatus (c). Chromosomes, 45S rDNAs and 5S rDNAs are dyed with blue, green and red colors,
respectively. Illustrations of rDNA distributions on the 11 watermelon chromosomes are
provided for C. lanatus subsp. vulgaris and C. lanatus subsp. mucosospermus (d) and C.
lanatus subsp. lanatus (e). Green and red dots represent 45S and 5S rDNAs, respectively.
a b c
d e
Nature Genetics: doi:10.1038/ng.2470
27
Supplementary Figure 9. Time inference of the watermelon whole genome duplication
(WGD) event and the divergence time estimation of the watermelon/cucumber speciation.
Nature Genetics: doi:10.1038/ng.2470
28
97103 JX-2 JLM JXF
RZ-901 XHBFGM Black Diamond Calhoun Gray
Sugarlee Sy-904304 RZ-900 PI482271
PI500301 PI189317 PI595203 PI249010
PI248178 PI482276 PI482303 PI296341-FR
PI482326
Supplementary Figure 10. Fruits of watermelon accessions used for genome sequencing and
resequencing.
Nature Genetics: doi:10.1038/ng.2470
29
Supplementary Figure 11. Distribution of disease resistance genes on watermelon
chromosomes.
Nature Genetics: doi:10.1038/ng.2470
30
Supplementary Figure 12. Venn diagram illustrating the extent to which the
watermelon, cucumber and pumpkin phloem transcriptomes contain common and
unique gene sets. The three cucurbit phloem sap transcriptomes were analyzed by
BLAST using an E-value cutoff of 1e-10 (see Supplementary Table 27). Data are
presented as percentages of the total number of transcripts for each cucurbit species.
The small number of unique pumpkin phloem transcripts (12.5%) reflects the absence
of a draft genome for this species; this compromised the identification of the full
phloem gene set in pumpkin.
Nature Genetics: doi:10.1038/ng.2470
31
Supplementary Figure 13. Model of sugar delivery to and metabolism within cells of
the watermelon fruit. Arrows indicate flux directions. Yellow block: No difference in
expression. For differentially expressed genes, the lowest and highest levels of
expression are represented by blue and red blocks, respectively. A green box indicates
higher expression in flesh than in rind, whereas a dark blue box indicates lower
expression in flesh than in rind. SE-CCC: sieve element companion cell complex; AGA:
α-galactosidase; GALK: galactokinase; UGGP: UDP-galactose/glucose
pyrophosphorylase; UGE: UDP-glucose 4-epimerase; UGP: UDP-glucose
pyrophosphorylase; PGM: phosphoglucomutase; HK: hexokinase; NI: neutral
invertase; IAI: insoluble acid invertase; SAI: soluble acid invertase; SUS: sucrose
synthase; FRK: fructokinase; PGI: phosphoglucoisomerase; SPS: sucrose phosphate
synthase; SPP: sucrose phosphate phosphatase; OPPP: oxidative pentose phosphate
pathway.
Nature Genetics: doi:10.1038/ng.2470
32
Supplementary Figure 14. Watermelon SC-uORF containing bZIP genes. (a) Phylogenetic
relationship of Arabidopsis bZIP proteins and bZIP proteins from other plant species
containing SC-uORF including four from watermelon. Accession or locus numbers: AtbZIP1
(At5g49450), AtbZIP2 (At2g18160), AtbZIP11 (At4g34590), AtbZIP44 (At1g75390),
AtbZIP53 (At3g62420), Am910 (Y13675), Am911 (Y13676), BZI-2 (AY045570), BZI-4
(AY045572), LIP19 (X57325), mLIP15 (D26563), OBF1 (X62745), rdLIP (AB015187),
TBZ17 (D63951), TBZF (AB032478), and OsOBF1 (AB185280). Subfamily of SC-uORF
containing genes is indicated by dotted-square line. The four watermelon genes are highlighted
with the one differentially expressed during fruit flesh development highlighted in red. (b)
Alignment of the SC-uORF of bZIP proteins.
b
Nature Genetics: doi:10.1038/ng.2470
33
Supplementary Figure 15. MADS-box proteins from watermelon. Phylogenetic tree of
MADS-box family proteins of watermelon (pink dots), cucumber (green dots), and Arabidopsis
(yellow dots). Tomato LeMADS-RIN and TAGL1, and strawberry FaMADS-RIN (red dots)
were also included in the tree.
Nature Genetics: doi:10.1038/ng.2470
34
Supplementary Figure 16. Citrulline content in watermelon fruit flesh and rind. Data
represents mean ± SE of two biological replicates. DAP: days after pollination.
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
10 DAP 18 DAP 26 DAP 34 DAP
Citru
llin
e c
onte
nt (m
g g
-1 F
W)
flesh
rind
Nature Genetics: doi:10.1038/ng.2470
35
Supplementary Figure 17. Citrulline metabolic pathway in watermelon. Expanded gene
families in watermelon compared to Arabidopsis are highlighted in yellow while genes
differentially expressed during watermelon fruit development are highlighted in green.
Nature Genetics: doi:10.1038/ng.2470