Dactylorhiza) with distinct ecological optima Plant Mini Kit (Qiagen) following the manufacturer’s...
Transcript of Dactylorhiza) with distinct ecological optima Plant Mini Kit (Qiagen) following the manufacturer’s...
Supporting Information, including Supporting Tables and Supporting Figures
Adaptive sequence evolution is driven by biotic stress in a pair of orchid species
(Dactylorhiza) with distinct ecological optima
Francisco Balao, Emiliano Trucchi, Thomas Wolfe, Bao-Hai Hao, Maria Teresa Lorenzo,
Juliane Baar, Laura Sedman, Carolin Kosiol, Fabian Amman, Mark W. Chase, Mikael
Hedrén, Ovidiu Paun
METHODS
RNAseq library construction and sequencing
The leaf material was fixed in RNAlater in the same morning, and was left at 4 °C overnight,
before transferring it to -80 °C for storage until required. Total RNA was extracted using the
RNeasy Plant Mini Kit (Qiagen) following the manufacturer’s instructions. The purified RNA
was stored at -80 °C. The concentration of RNA extracts was first measured with a NanoDrop
ND-1000 Spectrophotometer (Thermo Scientific) and its purity estimated according to the
wavelength ratio of A260/280. The quantification and quality of the RNA was then confirmed
with an RNA 6000 Nano-kit on a 2100 BioAnalyzer (Agilent Technologies). The ribosomal
RNA was depleted with the RiboMinusTM
Plant Kit for RNA-Seq (Invitrogen) following the
manufacturer’s protocol. RNA fragmentation was done by hydrolysis for 2 min at 94 °C and
used 2 µl 10x buffer RT (SuperScript III First-Strand Synthesis System for RT-PCR,
Invitrogen) plus 75mM MgCl2 in 12 µl concentrated RNA. After a cleaning step with an
RNeasy spin column (from an RNeasy Plant Mini Kit, Qiagen), the first cDNA strand has
been synthesized with the Superscript III First-Strand Synthesis System for RT-PCR
(Invitrogen) and random hexamers. Surplus dNTPs have been eliminated with a Mini Quick
Spin Column for DNA (Roche). The second strand cDNA synthesis has been performed with
dUTPs, to allow for strand distinction upon Illumina sequencing. After a final clean up step
with a MiniElute Reaction Cleanup kit (Qiagen) and quantification using a Quant-iT
Picogreen dsDNA assay in a NanoDrop Fluorospectrometer ND-3300 (Thermo Scientific),
the final RNAseq library preparation (using NEBnext Ultra RNA kit and an UGdase
Balao et al. SI - Transcriptomic divergence in Dactylorhiza
2
treatment) and directional Illumina sequencing as 100bp paired-end reads was performed at
the CSF Vienna (www.csf.ac.at/ngs). Samples fB4, iB0 and iS4 (Table 1) were sequenced as
half lanes, whereas sample fP7 has been sequenced as two half lanes. Six other samples were
sequenced as full lanes.
After controlling read quality with FastQC v0.11.2 (available from http://www.bioinformatics
.babraham.ac.uk/projects/fastqc/), filtering of the raw reads was done with Trimmomatic
v0.32 (Bolger et al., 2014) with default settings, except for increasing the threshold for
average quality per base to 20 when scanning the reads over four-base sliding windows, and
only retaining adaptor-free, high-quality reads that are at least 50 bases long. After quality
filtering 78.6% of the read pairs were retained (Table S1). We further used only the paired-
end filtered reads, as the single-end files produced by Trimmomatic contained only a
negligible amount of orphan reads. The reads were finally filtered against a database of all
whole genome Virus and Bacteria from NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
and ftp://ftp.ncbi.nlm.nih.gov/genomes/Viruses/ version 2014-07-03) by using Bowtie2 v2.2.3
(Langmead and Salzberg, 2012) with default setting. Less than 1% of reads were found to be
potential contamination from viruses or bacteria.
De novo transcriptome assemblies and annotation
The transcriptome of each sequenced accession was individually assembled from its cleaned
reads using Trinity version r20140413 (Grabherr et al., 2011; Haas et al., 2013) with a kmer
setting of 25, strand-specificity (i.e., FR orientation) and retaining only contigs with a
minimum length of 200 bp. One individual reference per species has been then retained based
on the total length, N50 and the percentage of reads that uniquely mapped back to each
reference by using the CLC Genomics Workbench v7.5 (Qiagen) with strand-specificity, a
minimum similarity fraction of 0.95 over at least 0.95 of the length and
mismatch/insertion/deletion costs of 2/3/3. A multi-individual assembly was also attempted
for each species, however this failed to extend the N50 length significantly or increase the
back-mapping rate (results not shown), and was discarded to avoid the risk of chimeric
contigs.
Ecologically-relevant genes are expected to be generally expressed at low rates, and hence
may be missed in individual assemblies. To account for this, we have retained for each
accession the un-mapping reads when aligning them to the selected species-specific assembly,
and these reads were further combined per species and were re-assembled with Trinity with
Balao et al. SI - Transcriptomic divergence in Dactylorhiza
3
the same settings as above. The D. fuchsii, and, respectively, D. incarnata “lowly expressed”
contigs were then added to their specific reference. A common reference for the two species
has been constructed by pooling together the two individual references. We finally removed
redundancy from each of the three constructed references by using the clustering algorithm of
CD-HIT-EST (Fu et al., 2012) with a global identity of 80% over at least 70% of the length of
the shorter sequence (i.e., -aS 0.7) and by comparing only the 5’-3’ strands (i.e., -r No). The
final references are hereafter referred to as “f_reference”, “i_reference” and the Dactylorhiza
(i.e., “f_i”) reference.
The individual Trinity assemblies contained between 7,686 (fA6, Table S1) and 70,193
contigs (iS8), with N50 ranging from 303 (for fA6) to 472 bp (for fB4). Based on the richness
(i.e., the total length, Table S1), contiguity (i.e., N50 estimates) and completeness (i.e.,
mapping rate) we retained the assemblies of fB5 and iS8 as the best representative
transcriptomes for D. fuchsii and respectively D. incarnata. In order to get better
representation of the lowly expressed genes, species-specific Trinity assembly of all D. fuchsii
and, respectively, D. incarnata reads that did not map to these reference transcriptomes were
performed and resulted in 3,113 and 2,064 transcripts for D. fuchsii and, respectively, for D.
incarnata. After combining the different assemblies and removing redundancy as explained
above, the final combined fi_reference contains 33.8 Mbp within 101,010 transcripts (52.3%
originated from D. incarnata and 47.7% from D. fuchsii; Table S2). This transcriptome will
be of high value for further studies of gene expression alterations following whole genome
duplication events in Dactylorhiza.
Functional annotation analyses of the fi_reference were performed in Blast2Go v.3.2.7
(Conesa and Götz, 2008) using cloud-based NCBI BLAST+ searches against the
Viridiplantae database v. 17.01.2016 under the BLASTX algorithm and a minimum e-value
of 10-6
. Blast2Go was further used to assign gene ontology (GO) terms to the contigs, to
identify signatures of protein domains using InterProScan (Quevillon et al., 2005), and to
annotate non-coding RNAs and cis-regulatory elements by performing Rfam scans against
Xfam servers. Of the 101,010 transcripts in the fi_reference, 42.2% had significant BLASTX
hits in Blast2Go. After mapping these contigs and integrating InterProScan results, 30,046
contigs were successfully annotated with GO terms (Fig. S1). In addition, 121 contigs
received annotations from the Rfam scan, with 80 of them representing an rRNA type and 16
intronic regions.
Balao et al. SI - Transcriptomic divergence in Dactylorhiza
4
Small RNA library preparation and sequencing
Isolation of RNA with smRNA enrichment was performed with the mirVana miRNA
Isolation Kit (Life Technologies) following the manufacturer’s instructions. Up to 90 mg of
the same tissue fixations as for the RNAseq experiment were used. The concentration of the
raw smRNA isolates was measured with a Quant-iT Picogreen dsDNA assay (Invitrogen) and
a NanoDrop Fluorospectrometer ND-3300 (Thermo Scientific). The quantification and quality
of the RNA isolate was then confirmed with a 2100 Bioanalyzer (Agilent Technologies) using
a small RNA analysis kit. The RNA extracts were denatured with loading buffer at 95°C for 3
min and further purified by gel size selection in a XCell SureLock Mini-Cell (Life
Technologies) using a microRNA Marker (NEB) and 15% TBE-Urea pre-casted gels (Life
Technologies) stained with 2x SYBR Green II (Life Technologies) for 1 h at RT. The smRNA
samples were re-isolated from gel slices by overnight incubation in 0.3M NaCl at 4 °C,
followed by filtration through Ultrafee-MC Durapore Filter Units 0.22µm (Merck Millipore)
and precipitation for 2 h at -20 °C in 2.5x volume 100% ethanol, together with 1 µl
GlycoBlue (Ambion). After resuspension in RNase-free water the smRNA isolates were again
checked on a Bioanalyzer small RNA chip. For individual iB6 the isolate has yielded
insufficient smRNA quantity and as no tissue was further available, the sample has not been
processed further. The smRNA libraries were prepared with the NEBNext Multiplex Small
RNA Library Prep Set for Illumina (Set 1 and 2, NEB) following the manufacturer’s protocol,
except for diluting the adapters 1:2 prior to use due to low sample concentration (130<mg)
and using only 1.5 µl instead of 2.5µl of the primers. The samples were purified on a 1.5 %
agarose gel with 1x TBE buffer. The DNA was extracted from the gel slices with a MinElute
Gel Extraction clean-up Kit (Qiagen) and eluted in water. Illumina sequencing as a half-lane
of 50 bp single-end reads was performed at the CSF Vienna (www.csf.ac.at/ngs). The
individual samples were demultiplexed with the BamIndexDecoder tool of the Illumina2Bam
software collection (available from http://gq1.github.io/illumina2bam/) and adapter sequences
have been removed using Trimmomatic v0.30 (Bolger et al., 2014). After demultiplexing and
adapters removal, the total number of reads longer than 15 nucleotides averaged ca. 9.9
million (std. 5.3 million) across all samples of the two species (D. fuchsii mean = 13.2, std. =
4.2; D. incarnata mean = 5.8, std. = 3.4).
Balao et al. SI - Transcriptomic divergence in Dactylorhiza
5
REFERENCES Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics 30: 2114-2120.
Conesa A, Götz S. 2008. Blast2GO: a comprehensive suite for functional analysis in plant
genomics. Int. J. Plant Genomics 2008: 619832.
Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-
generation sequencing data. Bioinformatics 28: 3150-3152.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, et al. 2011. Full-length
transcriptome assembly from RNA-Seq data without a reference genome. Nat.
Biotechnol. 29: 644-652.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, et al. 2013. De novo
transcript sequence reconstruction from RNA-seq using the Trinity platform for
reference generation and analysis. Nat. Protoc 8: 1494-1512.
Langmead B, Salzberg SL. (2012). Fast gapped-read alignment with Bowtie 2. Nature
Methods 9: 357-359.
Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R. 2005.
InterProScan: protein domains identifier. Nucleic Acids Res. 33: W116-W120.
Balao et al. SI - Transcriptomic divergence in Dactylorhiza
6
Table S1. Summary of individual RNAseq Trinity assemblies. The uniquely and,
respectively, total mapping rates refer to mapping with CLC Genomics Workbench of the
reads of one sample to the assembly produced by those respective reads with Trinity.
Accession Filtered pairs
of reads (M)
Transcripts Contigs Total
Megabases
N50
(bp)
GC
(%)
Unique/Total
mapping (%)
fA6 183.9 7,686 6,163 2.5 303 50.3 -
fP1 40.5 15,373 13,348 5.9 391 51.0 85.4/85.7
fP7 112.5* 48,895 35,771 19.0 385 52.9 26.4/47.9
fB4 91.4 61,109 42,999 26.4 472 49.7 18.1/79.9
fB5 190.3 62,338 46,340 23.0 361 50.4 86.8/87.3
iA6 168.1 24,785 19,818 8.3 311 51.2 95.0/95.2
iS4 73.6 31,404 24,208 12.2 391 50.8 35.9/81.2
iS8 148.7 70,193 53,126 23.7 322 51.2 86.2/86.6
iB6 74.4 17,125 14,411 5.9 328 52.7 86.6/86.8
iB0 66.8 29,697 22,604 12.2 431 50.7 16.3/79.2
*summed over two half lanes
Table S2. Summary of the final reference transcriptomes. Species Assembly Transcripts Contigs Total base pairs N50
D. fuchsii f_reference 54,596 43,834 19,420,738 356
D. incarnata i_reference 61,841 49,960 20,394,752 314
Combined fi_reference 101,010 88,461 33,794,033 319
Supporting Figure S1. Map of the native localities of the Dactylorhiza samples analysed here. The samples were transplanted in a common garden at least one growing season before the material was fixed for analyses.
(e)0 1,000 2,000 3,000 4,000
#Seqs
Oxidoreductases
Transferases
Hydrolases
Lyases
Isomerases
Ligases
(a)0 20,000 40,000 60,000 80,000 100,000
Total Sequences
With InterProScan
With Blast Hits
With Mapping
With Annotation
(d)
(c)
(b)0 2,000 4,000 6,000 8,000 10,000
Elaeis guineensisPhoenix dactylifera
Musa acuminata mal.Vitis vinifera
Nelumbo nuciferaOryza sativa Japonica
Jatropha curcasZea mays
Vigna angularisGlycine max
Erythranthe guttataBeta vulgaris vulgaris
Citrus sinensisGossypium raimondii
Brassica napusTheobroma cacao
Prunus persicaMalus domestica
Nicotiana sylvestrisSesamum indicum
Amborella trichopodaEucalyptus grandis
Morus notabilisPrunus mumeSetaria italica
Ricinus communisMedicago truncatulaOryza sativa Indica
Cicer arietinumothers
(f)
Supporting Figure S2. Results of annotation analyses of the combined Dactylorhiza fi_reference with Blast2Go. (a) Data distribution. (b) Blast top-hit species distribution. (c) Sequences with length(x) annotated. (d) GO distribution by level (2) - top 20. (e) Enzyme code distribution. (f) Rfam biotypes seqeunce distribution.
1000 2000 3000 4000 5000 6000 7000 8000Length (bp)
20%
30%
40%
50%
60%
70%
80%
90%
100%
10%0 4,000 8,000 12,000 16,000 20,000
#Seqs
metabolic processcellular process
single-organism processbiological regulation
regulation of biological processresponse to stimulus
CC organization/biogenesislocalization
developmental processmulticellular organismal process
signalingreproduction
reproductive process+ regulation of biological process
multi-organism process- regulation of biological process
growthimmune system process
detoxificationrhythmic process
cellcell part
organellemembrane
membrane partmacromolecular complex
organelle partmembrane-enclosed lumen
cell junctionsymplast
extracellular regionsupramolecular fiber
virionvirion part
nucleoidextracellular region part
other organismother organism partextracellular matrix
extracellular matrix component
bindingcatalytic activity
transporter activitystructural molecule activity
nucleic acid binding TF activitymolecular function regulator
molecular transducer activityelectron carrier activity
antioxidant activityTF activity, protein binding
nutrient reservoir activitymetallochaperone activity
protein tagtranslation regulator activity
Biological Process:
Molecular Function:
Cellular Component:
0 10 20 70 80Sequences
IntronGene
Gene; snRNA; snoRNA; HACA-boxGene; rRNA
Gene; snRNA; snoRNA; CD-boxCis-reg; frameshift_element
Cis-reg; riboswitchCis-reg
Gene; snRNA; splicingGene; miRNA
0.1D. incarnata B0
D. fuchsii B5
D. incarnata S4
Orchis italica
D. incarnata A6
D. incarnata B6
D. fuchsii B4
D. fuchsii P1
D. incarnata S8
D. fuchsii P7
8 8
100
89
90
100
100
100
Supporting Figure S3. Maximum-Likelihood RAxML phylogenetic tree based on 449,518 high-quality cSNPs, illustrating relationships between the Dactylorhiza accessionsanalysed. Bootstrap percentages are indicated.
−0.4 −0.2 0 0.2 0.4
−0.6
−0.2
0.2
0.6
V1 (66.4%)
V2 (5
.9%
)
iB6
iB0
iS4
iS8
iA6
fP1
fP7
fB5
fB4
-0.2
-0.4
0
V1 (53.9%)
V2 (1
2.0%
) fP1
fP7
fB5
fB4
fA6
iB0
iS4
iS8
iA6
0 0.2
0.4
V1 (55.8%)
V2 (1
2.5%
)
(a) (b)
(c)-0
.6-0
.40.
4
0-0.2 0.40.2
fP1
fP7fB5
fB4
fA6 iB0
iS4
iS8
iA6
(d)-0
.20
0.2
0.4
0.2
-0.2
Supporting Figure S4. Transcriptome variation within and between D. fuchsii and D. incarnata. (a) SNPRelate PCA on 129,511 filtered biallelic cSNP variants. (b) PCA showing the largest components of variance in gene expression as uncoveredby edgeR. (c) PCA drawn with EDAseq on the patterns of expressed miRNA and tasiRNA (i.e., analysis of 20-22nt small RNAs). (d) EDAseq PCA on patterns of siRNA expression (i.e., analysis of 24nt small RNAs).
−6 −3 0 3 6
V1 (70.5%)
V2 (1
3.3%
)−4
−20
24
iB6
iB0iS4
iS8
iA6
fP1
fP7
fB5
fB4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
00.
10.
20.
3
D. fuchsii vs. O. italica
Ks
Ka
D. incarnata vs. D. fuchsii
00.
10.
20.
3K
a
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Ka/Ks > 1Ka/Ks ≤ 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
00.
10.
20.
3 D. incarnata vs. O. italicaK
a
Ks
D. fuchsiivs.
D. incarnata
D. fuchsiivs.
O. italica
D. incarnatavs.
O. italica
Ka
00.
050.
100.
15
D. fuchsiivs.
D. incarnata
D. fuchsiivs.
O. italica
D. incarnatavs.
O. italica
00.
20.
40.
6
Supporting Figure S5. Representation of the synonymous and non-synonimous sustitution rates. (a) KaKs plots of pairwise analyses. The red dots represent the putative CDS showing signals of positive selction; the grey dots indicate putative CDS that show signals of purifying selction. (b) Boxplots of the synonymous substitution rates (Ks) and the non-synonymous substitution rate (Ka) in intergeneric (blue) and intrageneric (green) comparisons. Note the different scales on the Y-axes.
(a)
(b)
methionineadenosyltransferase Ac
molybdate ion transmembranetransporter Ac
small conjugating protein Bi
dihydrokaempferol 4−reductase Ac
ubiquitin Bi potassium iontransmembrane transporter Ac−2.0
−1.6
(a)
semantic space Y
sem
antic
spa
ce X
copper ionBi
secondaryactive
transmembranetransporter
delta 3−trans−hexadecenoicacid phosphatidylglycerol
desaturase
methionineadenosyltransferase Ac
ammonia−lyase Ac
primary amineoxidase Ac
catechol oxidaseAc
oxidoreductase activity,oxidizing metal ions
hexosetransmembrane transporter
−2.0−1.5
(b)
Supporting Figure S6. Enriched molecular functions (p < 0.01) with elements targeted by positive selection in D. fuchsii (a), and in D. incarnata (b). Bubble size is proportional tothe frequency of the respective term in the public GO database. The colour represents the log10value of the significance of the Fisher’s tests of enrichment, corresponding to the indicated scale. Ac - activity; Bi - binding.
136672
381
200
13157
272
1237
edgeR DESeq2
baySeq
Supporting Figure S7. Differential gene expression analysis of D. incarnata versus D. fuchsii. (a) edgeR drawn MA-plot of the relative expression levels. The x-axis shows the log2 of the counts per million of mapped reads (CPM) for each cluster. The y-axis shows the log2 of the expression fold change (FC) for each transcript. The red dots represent the clusters that were DE between the two Dactylorhiza species. (b) Intersection of the DE results of the three tests performed at a level of false discovery rate (FDR) of 0.05. (c) Histogram of the log2 of the expression fold change values. (d) Heat map of the top 50 most differentially expressed clusters between D. incarnata andD. fuchsii.
(a) (b)
−10 −5 0 5 10
05K
10K
15K
logFC
Freq
uenc
y
(c) (d)
structural constituentof ribosome
aspartic−typeendopeptidaseAc
structural moleculeAc
waterchannel Ac
chlorophyllBi
ribulose−1,5−bisphosphatecarboxylase/oxygenaseactivator Ac
alliin lyase Ac
methionine−tRNAligase Ac
intramoleculartransferase
Ac, phosphotransferases
RNA−directedDNA polymerase Ac
pigment Bi
tetrapyrroleBi
nucleicacid Bi
RNA−DNA hybridribonuclease Ac
glycerol transmembranetransporter Ac
watertransmembrane
transporter Ac
peptidaseAc
sem
antic
spa
ce Y
−6
−3
structural constituentof ribosome
RNA−DNA hybridribonuclease
Ac structural molecule Ac
tetrapyrrole Bi
allene−oxide cyclase Ac
inositol 3−alpha−galactosyltransferase Ac
retinal dehydrogenaseAc
magnesium chelatase Ac
oxidoreductase Ac
pigment Bi
chlorophyll Bi
naringenin−chalconesynthase Ac
pantetheine−phosphate
adenylyltransferase Ac
GTP Bi
minus−end−directedmicrotubule motor Ac
hexosaminidase Ac
proteindisulfide
oxidoreductase Ac
semantic space X
−5
−3
Supporting Figure S8. Enriched molecular functions (p < 0.01) that are affected by overexpression in D. fuchsii (a) and D. incarnata (b). The log10 of the p-value of the enrichment test is shown by the colour of the bubbles, according to the indicated scale. The size of thebubbles is proportional with the frequency of that particular GO term in the public GO database. Ac, activity; Bi, binding.
(a)
(b)
semantic space y semantic space y
Supporting Figure S9. Enriched (p < 0.01) biological processes (a-b) and molecular functions (c-d) with elements differentially targeted (FDR < 0.05) by miRNAs/tasiRNAs between D. fuchsii and D. incarnata. (a, c) Enriched GO terms of genes with increased mi/tasiRNA targeting in D. fuchsii. (b, d) Enriched GO terms of genes with over-regulation by mi/tasiRNAs in D. incarnata. Bubble size is proportional to the frequency of the respective term in the public GO database. The colour represents the log10 value of the significance of the Fisher’s test of enrichment, corresponding to the indicated scale. C, compound; Ac, activity; SAc, synthase activity; Bi, binding; Sy, synthesis; Pr, process; MPr, metabolic process.
−5
−3
sem
antic
spa
ce x
−5
−3
(c) (d)
aspartic−typeendopeptidase Ac
Bi
zinc ion Bi
catechol oxidase Ac
porphobilinogen SAc
enone reductase Ac
nucleic acid Bi
heterocyclic C Bi ion Bi
nuclease Accation Bi
organic cyclic C Bi
peptidase Ac
RNA−directed DNA polymerase Ac
Bi
nucleic acidBi
chlorophyll Bi
phenylalanineammonia-lyase Ac
ion Biserine−tRNA ligase Ac
zinc ion Bi
pigment Biheterocyclic C Bi
tetrapyrrole Bi
cysteine-typeendopeptidase Ac
alcohol Bi
cationBi
sem
antic
spa
ce x
−5−3
(a)
DNAintegration
chloroplast-nucleussignaling
nitrogen C MProrganiccyclic C MPr
cellular aromaticC MPr
box H/ACAsnoRNA 3'−endprocessing
porphyrin-containingC MPr
snRNAmodification
heterocycle MPr
mRNA pseudouridineSy
DNA MPr
nucleic acid MPr
cellular nitrogenC MPr
actinfilament organization
tetrapyrrole MPr
−5−3
(b)
photosynthesisDNA integration
photosynthesis, lightharvesting in photosystem I
photosynthesis, lightreaction
protein−chromophorelinkage
proteintargeting
to chloroplastseryl−tRNA
aminoacylation
maltose MPr
starch biosynthetic Prcinnamic acidbiosynthetic Pr
L-phenylalanine catabolic Pr
tetrapyrroleMPr DNA MPr
porphyrin-containing C MPr
semantic space y semantic space y
Supporting Fig. S10 Enriched (p < 0.01) biological processes (a-b) and molecular functions (c-d) with elements differentially targeted (FDR < 0.05) by siRNAs between D. fuchsii and D. incarnata.(a, c) Enriched GO terms of genes with increased siRNA targeting in D. fuchsii. (b, d) Enriched GOterms of genes with over-regulation by siRNAs in D. incarnata. Bubble size is proportional to the frequency of the respective term in the public GO database. The colour represents the log10 valueof the significance of the Fisher’s tests of enrichment, corresponding to the indicated scale. C, compound; Ac, activity; SAc, synthase activity; Bi, binding; Sy, synthesis; Pr, process; MPr, metabolicprocess.
−20−10
sem
antic
spa
ce x
−8−4
(c) (d)
ADP Bi
Bi
zinc ion Bi
pseudouridineSAc
cysteine−typeendopeptidase Ac
nucleic acidBi
heterocyclicC Bi
ion Bi
cation Bi
organiccyclic C Bi
serine−tRNAligase Ac
Bi
zincion Bi
serine-glyoxylate transaminase Ac
chlorophyll Binucleic acid Bi
phosphateion Bi
heterocyclicC Bi
ion Bi
cationBi
organiccyclic C Bi
ADP Bi
tetrapyrole Bi
−20−10
sem
antic
spa
ce x
(a)DNA integration
photosynthesis
organic cyclicC MPr
cellulararomaticC MPr
box H/ACAsnoRNA3'−endprocessing
snRNAmodification
snoRNA MPr
heterocycle MPr
mRNApseudouridine Sy
snRNAMPr
DNA MPr
macromolecule MPr
pseudouridine Sy
nucleicacid MPr
−10−5
(b)
regulation of tetrapyrrole MPr
DNAintegration
organic cyclicC MPr
cellulararomatic C MPr
seryl−tRNAaminoacylation
heterocycleMPr
DNA MPr
nucleicacid MPr
nitrogen C MPr
cellularMPr