1
Running head: Transcriptional modules in maize development Corresponding Author: Lewis Lukens Associate Professor Department of Plant Agriculture Crop Science Building University of Guelph 50 Stone Rd. E., Guelph, Ontario, N1G 2W1 Canada Phone: 519-824- 4120 x. 52304 Fax: 519-763- 8933 Email: [email protected] Research category: Genome Analysis
Plant Physiology Preview. Published on February 7, 2013, as DOI:10.1104/pp.112.213231
Copyright 2013 by the American Society of Plant Biologists
2
A developmental transcriptional network for Zea mays defines coexpression modules. Gregory S. Downs1, Yong-Mei Bi2, Joseph Colasanti2, Wenqing Wu2, Xi Chen3, Tong Zhu3, Steven J.Rothstein2, Lewis N. Lukens1 1 Department of Plant Agriculture; University of Guelph; Guelph, ON, Canada, N1G2W1 2 Department of Molecular and Cellular Biology; University of Guelph; Guelph, ON,
Canada, N1G2W1 3 Syngenta Biotechnology Inc.; 3054 Cornwallis Road, Research Triangle Park, NC,
USA, 27709
Summary: Through hierarchical clustering of transcript abundance data across a diverse
set of tissues and developmental stages in maize, we have identified a number of
coexpression modules which describe the transcriptional circuits of maize development.
Keywords:
Zea mays Development Plant Gene Expression Regulation Gene Regulatory Networks/genetics Maize Transcriptome Oligonucleotide Array Sequence Analysis Systems Biology Bioinformatics
3
Financial Source: This work was made possible by support from the Ontario Research Fund and Natural Sciences and Engineering Research Council of Canada. Corresponding Author: Lewis Lukens Associate Professor Crop Science Building Department of Plant Agriculture University of Guelph Guelph Campus 50 Stone Rd. E., Guelph, Ontario, Canada N1G 2W1 Phone: 519-824- 4120 x. 52304 Fax: 519-763- 8933 Email: [email protected]
4
ABSTRACT Here we present a genome-wide overview of transcriptional circuits in the agriculturally
significant crop species Zea mays. We examined transcript abundance data at 50
developmental stages, from embryogenesis to senescence, for 34,876 gene models and
classified genes into 24 robust coexpression modules. Modules were strongly associated
with tissue types and related biological processes. Sixteen of the 24 modules (67%) have
preferential transcript abundance within specific tissues. One-third of modules had an
absence of gene expression in specific tissues. Genes within a number of modules also
correlated with the developmental age of tissues. Coexpression of genes is likely due to
transcriptional control. For a number of modules, key genes involved in transcriptional
control have expression profiles that mimic the expression profiles of module genes,
although the expression of transcriptional control genes are not unusually representative
of module gene expression. Known regulatory motifs are enriched in several modules.
Finally, of the 13 network modules with more than 200 genes, three contain genes that
are notably clustered (p<0.05) within the genome. This work, based on a carefully
selected set of major tissues representing diverse stages of Zea mays development,
demonstrates the remarkable power of transcript-level coexpression networks to identify
underlying biological processes and their molecular components.
5
INTRODUCTION
Systems biology approaches recently have begun to elucidate the patterns of
transcriptome organization. In contrast to analyses that compare whole transcriptomes of
samples and those that compare mean levels of gene expression differences between
samples, the systems biology strategy integrates expression patterns of single genes to
infer their common biological function. Genes with coordinated expression across
samples are hypothesized to be co-regulated in response to external and internal cues and
to be regulated by similar transcription factors (Moreno-Risueno et al., 2010). Inferring
gene regulatory networks from transcriptome data and subsequently testing the attributes
of the network provides a system-wide view of developmental processes.
A number of studies have pooled diverse assortments of publicly available microarray
data to identify clusters of plant genes with shared patterns of expression (Fierro et al.,
2008; Ficklin et al., 2010; Mochida et al., 2011). A number of modules are conserved
across species (Ficklin and Feltus, 2011; Movahedi et al., 2011; Mutwil et al., 2011). For
example, modules associated with drought stress responses and cellulose biogenesis are
common to Hordeum vulgare, Arabidopsis thaliana and Brachypodium distachyon
(Mochida et al., 2011).
The great functional and morphological variation in plant tissue types arises from
differential regulation of a finite set of genomic transcripts. Microarray technology has
been used to compare gene transcript abundances between different tissues (Ma et al.,
2005; Schmid et al., 2005; Benedito et al., 2008; Jiao et al., 2009; Sekhon et al., 2011).
6
These studies have identified genes that are transcribed in specific organs and examined
the relationships among tissue expression patterns using principal components
transformation. These studies have also noted that similar tissues had more highly
correlated gene transcript abundances than less similar tissues; e.g., the correlation
coefficient of two developmental stages of leaf tissue is greater than the correlation
coefficient of leaf with another tissue.
Here, we have constructed a developmental gene expression network from microarray
transcriptome profiles of 50 Zea mays (maize) tissues across different stages of
development and identified modules of putative, co-regulated genes within this network.
We characterized the attributes of modules to begin to understand transcriptome
organization. Specifically, we investigated whether network modules are associated with
specific tissue types and are enriched for specific biological processes. Further, we
determined whether modules are specifically excluded from tissues, and if modules
reflect developmentally responsive processes. Moreover, we investigated the centrality of
transcription factors within modules, and if modules share common cis-regulatory motifs.
Finally, we determined whether modules contain genes that are clustered within the
genome. This work explores the gene expression network throughout maize development
for an inbred genotype grown under controlled conditions and describes the remarkably
discrete functionalities of modules within the network.
7
RESULTS
Gene networks of the maize developmental transcriptome
We set out to investigate the organization of the maize transcriptome throughout
development by analyzing a microarray data set generated from three biological
replicates of 50 tissue types (Table I). Samples were derived from all developmental
stages, from early embryo to senescence-stage leaves, including anthers, cob, ear,
embryo, endosperm, husk, leaf, ovule, pericarp, root, silk, stalk, and tassel. Several of
these tissues, including ear, leaf, and tassel were sampled at multiple stages of
development (Table I, Figure 1). All processed RNAs were hybridized to a custom
microarray with 82,661 probe-sets and 1,322,576 probes. Of these, 55,672 probe sets
were expressed in at least one tissue (Figure 2), and 33,664 mapped to the filtered gene
set of the maize gene models, release 4a.53 (Schnable et al., 2009). An additional 9,919
probe sets that were not annotated as genes mapped to the maize genome, and 12,089
probe sets did not match the genome using our criteria (Figure 2). To ensure the highest
level of data quality, among the probe sets that matched to the genome, we removed
redundant and non-specific probes prior to data analysis. In the end, we examined 34,876
probe-sets (Figure 2, Table S1). For clarity, we refer to these probe sets as genes.
We clustered all sample transcriptome profiles using the flashClust function of WGCNA
(Langfelder and Horvath, 2008) to obtain an overview of transcriptome relationships.
With few exceptions, biological replicate arrays cluster within a group containing only
the replicates of the tissue at that stage, and arrays from the same tissue cluster together
8
(Figure S1). Figure 3 shows a dendogram of all 50 tissues constructed from average
transcript abundances across replicate arrays. Distinct groups contain leaf, root, seed, and
silk expression profiles. Transcriptomes from tissues harvested at different developmental
stages largely cluster together, as do tissues that have strong developmental similarity
such as the V7 tassel and V7 ear. Nonetheless, some groups contain mixed tissues or do
not contain all arrays from a tissue type. For example, the R1 stalk is grouped with leaf
transcriptomes , and the pre-photosynthetic VE leaf did not cluster with other leaves
(Figure 3). The inner-most husk, a modified leaf, clustered with inflorescence tissues
(cob, ovule, silk, and tassel) rather than leaves (Figure 3).
We constructed a weighted gene coexpression network with the R software WGCNA by
transforming the 34,876 genes' pairwise Pearson correlation coefficients into a weighted
adjacency matrix (Langfelder and Horvath, 2008). We created a signed network, which
allows modules to contain both positively and negatively correlated genes since
transcripts involved in one process may be up- or down-regulated. The topological
overlap measure, or TO (Li and Horvath, 2007), was used to transform the adjacency
matrix into a coexpression distance matrix. Genes were clustered hierarchically, and a
dynamic tree-cutting algorithm cut the dendrogram and defined 49 modules. Genes
module assignments are given in Table S2. The modules range in size from 30 to 4,370
genes (mean 712, median 123). To validate modules, we compared the mean TO value
for each module to a distribution of mean TO values for 50,000 iterations of modules
composed of a randomly selected group of genes (Ravasz et al., 2002; Yip and Horvath,
2007). We focused on the 24 of the 49 modules that were validated as significant
9
(P<0.05; Table S3). These 24 modules contain 30,768 genes. Module eigengenes (ME)
were calculated for each module as the first principle component of the gene expression
matrix for the module, and these can be considered as a vector of gene expression values
characteristic of the module. Correlations between the MEs for each module indicate that
most modules have an eigengene with similar correlation patterns as the eigengene from
one or more other modules (Figure S2). The use of more permissive criteria for module
identification could group these modules together, but subsequent analyses revealed that
they have distinct attributes.
Many modules correlate with specific tissue types
We investigated whether each module's eigengene had significantly higher expression in
specific tissues relative to all other tissues. Tissue-specific modules may also contain
genes with low expression in one tissue type relative to others. Sixteen of the 24 modules
(67%) are moderately to highly correlated with tissue type (r>0.4; Figure 4). One or more
modules are correlated with anthers, ear, embryo, endosperm, leaf, pericarp, root, and
tassel. No module is correlated with cob, floret, husk, ovule, stalk, and silk (Figure 4). Of
the 24 modules, eight had eigengenes that are moderately to highly negatively correlated
(r<-0.4) with anthers, endosperm, leaf, root and stalk (Figure 4). Two of these eight
modules also had positive correlations with tissue types, so only two of the 24 modules
were not associated with a specific tissue type.
To investigate the robustness of tissue-associated modules, we cross-referenced genes'
modules with the list of tissue-specific genes reported by Sekhon et al. (2011) in a survey
10
of maize transcriptomes. Their study reported 863 tissue-specific genes, of which we
could trace 276 to the present experiment, due to differences between microarray
platforms and our stringent oligo to gene mapping criteria (Figure 2). Remarkably, 75%
(206) of the genes identified by Sekhon et al. as tissue specific were within network
modules that were significantly correlated with the same tissue (Table II, Table S4). The
70 (25%) tissue specific genes reported by Sekhon et al. that did not map to a module
may be, in part, explained by environmental effects that altered transcription profiles
between experiments. In our study all plants were grown in a controlled environment,
whereas Sekhon et al. (2011) harvested young plants from the greenhouse and older
plants from the field.
Modules are highly enriched for biological processes
Tissue specific modules are often characterized by specific biological functionalities. We
used the topGO package in R to identify Gene Ontology (GO) terms which appear in
modules more frequently than expected by chance. Nine of the 16 modules that positively
correlate with specific tissues are highly enriched with Gene Ontology biological
processes (Fisher's Exact Test, P<0.0001; Table S5). These modules are associated with
anther, ear, embryo, endosperm, leaf, root, pericarp, and tassel. The over-represented GO
terms are often consistent with known tissue attributes. Module Zm_mod12 is associated
with anthers (1010 genes, tissue correlation r=0.55, p=4 x 10-5) and is overrepresented by
genes related to “sexual reproduction” (GO:0019953) (Table S5). The module
Zm_mod11 (1109 genes, r=0.96, p=2 x 10-29) is correlated with roots and has an over-
representation of genes involved in “response to oxidative stress” (GO:0006979),
11
“oxidation reduction” (GO:0055114), and “hydrogen peroxide catabolic process”
(GO:0042744) (Table S5). A number of tissues are associated with different modules,
suggesting that functionally distinct modules can share tissue specificity. Leaves have
two modules with significant GO terms: Zm_mod06 and Zm_mod07. Zm_mod06 (2069
genes, r=0.83, p=6 x 10-14) is enriched for 6 GO biological process terms relating to
photosynthesis (Table S5). This module contains almost every enzyme in three pathways
important for chlorophyll biosynthesis: biosynthesis of chlorophyllide a, biosynthesis of
phytyl diphosphate, and biosynthesis of chlorophyll a (Figure S3). Phytyl diphosphate is
the source of the phytyl chain in chlorophylls, and chlorophyllide a is an intermediate
compound in chlorophyll biosynthesis. The biological process “glycolipid transport”
(GO: 0046836) is over-represented within module Zm_mod07 (1808 genes, r=0.44, p=1
x 10-3) (Table S5). Together, the annotations provide strong evidence of the
functionalities of many tissue-specific modules.
Four of the eight modules negatively correlated with a tissue type are significantly
enriched for GO biological process terms. “Glycine betaine biosynthetic process”
(GO:0031456) was enriched in module Zm_mod02 (4081 genes; r=−0.99, p=7 x 10-40)
which is negatively associated with anther tissue. Zm_mod01 is also negatively
associated with anthers (4370 genes; r=−0.47, p=6 x 10-4) and was enriched for
“translation” (GO:0006412) and “ribosome biogenesis” (GO:0003743) (Table S5). Genes
within modules that were neither positively nor negatively correlated with a tissue did not
have significant enrichment for any GO term (Figure 4, Table S5).
12
Modules capture developmental stages within tissues.
The relationship of leaf transcriptomes and to a smaller degree embryo transcriptomes
reflected the developmental age of the samples (Figure 3). Leaf samples include two
samples of juvenile leaves and one sample of adult leaf prior to flowering (V1, V2, and
V5; Table I). We also sampled the 2nd leaf above the top ear one day before pollination
and at four time points following pollination (10, 17, 24, and 31 DAP). A number of
modules are correlated or anticorrelated either with green leaves sampled prior to
anthesis/silking (V1, V2, and V5) or with green leaves sampled after flowering (R1 and
10, 17, 24, and 31DAP; Figure 5). Other modules, including Zm_mod06, enriched for
genes related to photosynthesis, have correlations with all leaf samples. Unlike leaves
before and after flowering, no modules differentiated the two juvenile stage leaves from
the single, adult leaf. A heatmap of module eigengene correlations with individual tissues
is shown in Figure S4.
The module eigengene for Zm_mod13 (691 genes) is very strongly correlated with
embryo age (10, 17, 24, or 31 DAP; r= 0.96, p=4 x 10-27). This module is enriched for the
GO term "embryonic development" (GO:0009790) (Table S5). Chloroplastic thiamine
thiazole synthase 2, THI1-2 (GRMZM2G074097), an enzyme in the thiamine
biosynthetic pathway, is within the Zm_mod13 module. The transcript abundance of thi1-
2 increases in embryos from 15 to 36 days after pollination (Belanger et al., 1995).
(Figure S5).
Expression of transcription-related genes within modules
13
Transcription of specific sets of genes triggers molecular cascades that determine
developmental fates (Kaufmann et al., 2010). Modules contain transcription related genes
that are correlated with module eigengenes. For example, the leaf-associated module
Zm_mod14 (315 genes; r=0.61, p=2 x 10-6) contains sigma factor SIG2A of RNA
polymerase (GRMZM2G143392) with an eigengene correlation, or module membership
(MM), of 0.74 (Table S2). The sigma factor is a nuclear-encoded gene whose product is
transported to the chloroplast where it facilitates plastid RNA polymerase (PEP) binding
to chloroplastic promoters, predominately in leaf tissue containing mature chloroplasts
(Lysenko, 2007). The eigengene of the endosperm-associated module Zm_mod09 (1403
genes; r=0.69, p=3 x 10-8) is highly correlated both with GRMZM2G118205, which
encodes a protein similar to the Polycomb group FIE1 (FERTILIZATION-
INDEPENDENT ENDOSPERM 1) protein, and GRMZM2G146283, which encodes a
PBF (prolamin box-binding factor) protein (MM=0.93 and MM=0.98, respectively).
Inheritance of a loss of function fie1 allele by the Arabidopsis thaliana female
gametophyte results in embryo abortion (Ohad et al., 1996), and expression of the maize
fie1 ortholog is restricted to embryo and endosperm tissue (Springer et al., 2002). PBF is
thought to activate the expression of prolamin seed storage protein encoding genes during
endosperm development by binding to the prolamin box motif (TGTAAAG) (Vicente-
Carbajosa et al., 1997).
We hypothesized that genes related to transcriptional control would have expression
patterns more highly similar to each module's eigengene than do other genes. In ten
modules, transcription related genes have one of the top five ranks when genes within the
14
module are sorted by descending MM (Table S6). Nonetheless, the top rank of the
transcription-related gene is not significantly higher than expected for other genes within
any module (p<0.01, Table S6). We also evaluated whether genes classified with GO
terms related to transcription have on average higher topological overlap scores than
expected. The observed connectivity of transcription-related genes was not significantly
greater than expected for any module (data not shown).
Coordinated regulation of module genes may be in part due to shared transcription factor
regulation. Transcription factors bind to specific promoter motifs upstream of the
transcription initiation site. We used Fisher’s Exact Test to determine if any one of 106
previously reported maize regulatory motifs is over-represented in the upstream
sequences of module genes. Six of the 24 modules are enriched for ten motifs (Table III).
An interesting motif is CC(A/G)CCC which is over-represented in Zm_mod01 and
Zm_mod06. These modules are negatively and positively correlated with leaf tissue,
respectively. The MNF1 (mitochondrial nucleoid factor 1) transcription factor is
associated with CC(A/G)CCC and initiates transcription of Ppc1 (C4-type
phosphoenolpyruvate carboxylase) in Zea mays mesophyll cells exposed to light
(Morishima, 1998). While plant promoters are often described as compact, there are
exceptions to this general rule. An examination of 1 kb, 1.5 kb, and 2 kb upstream
sequences identified some novel, over-represented promoter motifs within modules. Of
the ten motifs identified as over-represented in the 500bp upstream regulatory sequences
of certain modules, six were shared when 1 kb of upstream sequences was examined, and
four were shared when 2 kb was examined (Table S7).
15
Module genes are significantly clustered within the genome
Previous work has shown that transcript levels of physically proximate genes are, on
average, more highly correlated than expected by chance (Caron et al., 2001; Lercher et
al., 2002; Zhan et al., 2006). We investigated whether module genes tended to have non-
random genomic positions. Under the null hypothesis, the genomic position of a module's
gene in the genome is independent of all other module genes, and the distribution of
module genes per chromosomal interval is expected to follow a Poisson distribution with
an equal mean and variance. For the thirteen modules that contain more than 200 genes
(Table IV), we calculated the module dispersion score: the average number of module
member genes within a 300kb segment of chromosomal DNA divided by the variance.
Three of the twelve modules have genes that are significantly (P<0.05) clustered (Table
IV), although the mean number of genes per interval is lower than the variance for every
module. Two of these, Zm_mod10 (1147 genes, r=0.53, p=9 x 10-5) and Zm_mod11
(described above), are positively associated with roots (Figure 4). The other, Zm_mod01,
is negatively associated with leaf tissue.
DISCUSSION
The concept of modularity in transcriptome analyses is that transcript abundance data can
be partitioned into a collection of discrete and informative modules. Each module is self-
contained and presumably functions to perform a distinct task separate from the tasks of
other modules. At the same time, the components of a transcriptome are dynamically
interconnected, so a complex web of interactions defines transcript patterns and
16
abundances. To investigate the interconnections and modular structure of plant
developmental transcriptomes, we constructed a transcriptional network from a high
quality microarray data set derived from 50 Zea mays tissue types and developmental
stages that represent the range of maize morphogenesis and span developmental time
from embryogenesis to senescence. We clustered transcripts into a hierarchy with nested
modules of increasing sizes and decreasing interconnectedness, and we identified 49
modules, 24 of which have robust inter-connectivity.
With a custom-designed Affymetrix microarray chip to assay transcript levels we were
able to map 60% (33,664 of 55,672) of the probe-sets for which we detected target
hybridization to the high quality, filtered gene set originally reported by Schnable et al.
(2009, Figure 1). Similarly, Sekhon et al. (2011) designed probe sets for a Nimblegen
microarray with maize transcript assemblies and FGENESH gene models of the B73
genome sequence and found that about 70% of the probes matched filtered maize gene
models. The maize filtered gene set is a conservative list of maize genes, and of the
55,672 probe-sets for which we detected target hybridization, 9,919 (18%) map to the
B73 genome and do not map to the maize gene models (Figure 2). Twenty-two percent of
the expressed probes did not match the genome, perhaps because some probes arise from
mis-assembled unigenes. The maize genome also has gaps, and some transcripts may
have arisen from genes that were not sequenced. Finally, some probe sequences may be
derived from a transcribed gene that is absent from the B73 genome. Using comparative
genomic hybridization (CGH) on Nimblegen arrays, Springer et al. (2009) found
megabase-size B73 regions that contained genes and were absent in the maize inbred
17
Mo17 genome. Beló et al. (2010) and Swanson-Wagner et al. (2010) also have identified
thousands of potential copy-number variations (CNVs) among Zea mays genomes. After
eliminating cross-hybridizing and redundant probe sets, we identified 34,876 genes.
Network modules have strong associations with specific tissues and biological
processes
Modules are comprised of genes that have similar patterns of expression across all
tissues. Nonetheless, twenty-two of our 24 robust modules (92%) are characterized by
transcripts that are preferentially expressed or repressed within a specific tissue type
relative to all other tissue types (Figure 4). Our results indicate organ identity is a primary
factor that explains transcriptome variation throughout plant development and suggest
that organ identity is the key determinant of cellular function. This discovery echoes the
identification of numerous mutants that have aberrant cell structures but nonetheless
demonstrate normal organ development (Smith et al., 1996). Whole transcriptome
comparisons consistently show that age, cell type, and environmental stimuli have a
relatively minor effect on transcriptional profiles relative to organ type (Ma et al., 2005;
Schmid et al., 2005; Druka et al., 2006; Jiao et al., 2009; Sekhon et al., 2011). In addition,
large numbers of tissue specific transcripts have been identified in plants (Ma et al., 2005;
Druka et al., 2006; Sekhon et al., 2011). For example, of 18,481 detected transcripts from
barley, 650 were expressed in only a single tissue type (Druka et al., 2006). Seventy-five
percent of the maize tissue specific genes reported by Sekhon et al., (2011) that we
examined are found in the appropriate, tissue-specific modules. Although environmental
stimuli may rewire underlying network architectures (Luscombe et al., 2004), the
18
congruence of these results indicate that the rough outlines of developmental network
topologies are highly robust. Key expectations of tissue specific modules are first that
mutations within genes most highly connected, or central, to a tissue specific module will
have a phenotypic effect that preferentially affects that tissue. Second, as a number of
elicitor molecules drive organ change- for example, GA can induce floral feminization-
genes differentially expressed in response to elicitor treatment should be over-represented
in specific, developmental modules (Zhan and Lukens, 2010).
Maturation signals and switches in cell identity also activate distinct transcriptional
regulatory modules within a single organ. By examining leaf tissues of different ages, we
identified modules preferentially expressed in leaves prior to anthesis and silking and
modules preferentially expressed after flowering. We also identified one module
eigengene with a strong correlation with embryo age. The detection of modules
correlated with different stages of a single organ was possible because of the breadth of
the transcription profiles collected in this study.
We find a high congruence between modules’ associations with specific tissue types and
biological processes. Twelve of the 24 verified modules were enriched for genes involved
in specific biological processes, and all were positively or negatively associated with
specific tissue types. Ficklin and Feltus (2011) developed a maize transcriptome network
using 297 microarray datasets from various maize tissues and genotypes grown in a
number of conditions, including forty-eight arrays from pulvinus and nine arrays of
methylation filtered genomic DNA. They identified clusters of genes enriched for
19
ribosome and translation, seed storage activity, and photosynthesis (Ficklin and Feltus,
2011). It is likely that the robust signal of the different tissues within these samples
contributed to the functional enrichment of modules. Nonetheless, it would be interesting
to identify if modules clustered around functions independently of tissue type.
We expect that modules are a valuable resource for predicting gene function. Many of the
genes with high module membership have unknown functions. For these sparsely
annotated genes, their hub status in a particular tissue-specific module generates novel
hypotheses through the principle of guilt-by association. For example, the transcript
detected by probe set Zm028519_at is highly correlated with the Zm_mod13 eigengene
(MM=0.98) and with embryo age, with a similar expression profile in embryo to
GRMZM2G074097 (Figure S5) discussed previously. The Unigene sequence on which
the Zm028519_at probe set was based is homologous (blastn versus EST database, e-
value = 3 x 10-118) to an EST from a Sorghum bicolor embryo library. Nonetheless, the
transcript does not map to the maize filtered gene set, has no known functions, and is not
detected in any tissue other than embryo (Figure S5).
Transcription-related genes are correlated with module eigengenes but are not
notably central to modules.
Genes within a module are likely transcriptionally regulated. A number of transcription-
associated genes have high module memberships (Table S2). These genes include a
chloroplastic RNA polymerase protein, a chromatin remodeling enzyme, and a DOF
transcription factor. Nonetheless, transcription factors and other genes involved in
20
transcription were not unusually central to modules relative to genes with other functions
(Table S6). We propose three explanations for this observation. First, transcription factors
that are expressed in only one tissue may be rare. Transcription factors likely are active in
more than a single condition and can alter their regulatory interactions between
conditions (Luscombe et al., 2004; Brady et al., 2011). Second, precise regulation of
module genes may arise through the combinatorial protein interactions of transcription
factors (Smaczniak et al., 2012). Finally, transcription factors that direct organ cell
identity may act transiently to establish the identity early in development, and the present
study did not capture this time point. For example, heritable silencing of the expression of
transcription factor FLC (FLOWERING LOCUS C) is accomplished by transient
expression of VIN3 (vernalization-insensitive 3) that guides the heritable, epigenetic
modification of FLC (Sung and Amasino, 2004). We note that binding sites of
transcription factors can be over-represented among module genes. Six modules have a
significant over-representation of ten known promoter motifs (Table III). The genome
wide, verified, binding sites of key developmental transcription factors (Bolduc et al.,
2012; Morohashi et al., 2012) may further elucidate how genes within modules are co-
regulated.
Some module members are physically clustered in the maize genome.
Of the thirteen modules with more than 200 genes, genes within three modules are
significantly more co-located in chromosomal regions than expected by chance (Table
IV). Clusters of functionally related genes may arise because of a shared chromatin
environment that promotes their co-regulation (Udvardy et al., 1985). Alternatively,
21
epistasis among genes within modules seems feasible, as modules contain genes that act
in biochemical pathways (e.g. Figure S3). Selection may have favored linkage of epistatic
genes to reduce recombination between favorable alleles, thus contributing to the
variability in linkage disequilibrium decay across the maize genome (Inghelandt et al.,
2011).
Here, we begin to investigate factors that explain maize transcriptome variation across
development and the regulatory basis for that variation. Future work will improve the
resolution of the maize transcriptome modules by incorporating in-depth expression data
of diverse RNAs. The functional relationships among genes across development also will
likely be improved through integration of other data (Zhu et al., 2008). Finally, a major
objective will be to functionally characterize modules and to investigate how to alter
modules to drive developmental changes.
MATERIALS AND METHODS
Growth conditions and tissue sampling
An elite Syngenta Zea mays (maize) inbred (SRG200) was grown in a greenhouse at the
University of Guelph during the summer of 2007. Growth conditions were 16 hour days
(~600 μmol m-2 s-1) at 28˚C, 8 hour nights at 23˚C, and 50% relative humidity. Plants
were grown semi-hydroponically in pots containing Turface® clay, watered with a
modified Hoagland’s solution containing: 0.4 g/L 28-14-14 fertilizer, 0.4 g/L 15-15-30
fertilizer, 0.2 g/L NH4NO3, 0.4 g/L of MgSO4•7H2O and 0.03 g/L of micronutrient mix
(S, Co, Cu, Fe, Mn, Mo and Zn). Plant samples representing fifty developmental stages
22
were sampled for RNA extraction (Table I, Figure 1, Figure S6). Three biological
replicates per sample were harvested in the middle of the day to minimize complications
due to diurnal changes in C and N metabolism. Total RNA was isolated and used for
cDNA, cRNA synthesis and labeling followed a standard protocol recommended by
Affymetrix. Labeled cRNAs were fragmented and applied to a maize custom GeneChip
microarray for molecular hybridization. The array images with hybridization signals were
acquired and quantified by GeneChip Operation System (GCOS) software (Affymetrix).
The quality of the hybridization was assayed using Expressionist (GeneData).
Experiments were repeated for the arrays that failed to pass the quality assay.
Microarray attributes and data preparation
RNA was hybridized to a custom Affymetrix Unigene array with 82,662 probe sets, each
consisting of 16 probes of 25 nucleotides. The 150 CEL files were normalized using the
Robust Multichip Average (RMA) method from the “affy” library of the BioConductor
package (version 2.6) of the R statistical framework (version 2.11.0) (Gentleman et al.,
2004; R Development Core Team, 2010). Probe sets were removed from the data set if
the average probe set signal across three replicates was beneath the detection threshold
(log2(100)=6.64) in all 50 tissue types (26,989 probe sets). ANOVA was used to ensure
that the replicates did not significantly differ. Two arrays - anthers replicate 1 and V1 leaf
replicate 1 - were removed from the analysis. Replicates were then averaged.
The microarray platform was annotated by determining homology between probe
sequences and the cDNA sequences for predicted transcripts from the filtered gene set of
23
the 4a.53 release of the B73 genome using BLAST (blastn, Altschul et al., 1990). For a
probe to match a transcript, either at least 23 contiguous nucleotides out of 25 were
required to match, or 24 of 25 match with an internal gap. Probes that matched more than
14 nucleotides but less than 25 nucleotides and 85% identity were noted as close
matches. Close matches were used to identify cross-hybridization among transcripts, as
described below. If 12 of the 16 probes in a probe set matched the same transcript, the
probe set corresponded to that gene. If fewer than twelve but more than one probe in a
probe set was a match or a close match for the same gene, the probe set was identified as
a partial match for that gene.
In order to identify expressed transcripts that did not correspond to the filtered gene set of
gene models, we searched genomic DNA for the cDNA sequences from which the probes
were designed. Exonerate (Slater and Birney, 2005), an aligner that uses more exhaustive
heuristics than BLAST, was used to map the probe sets to genomic sequence. The “--
model est2genome” parameter was used, which allows for introns of reasonable length.
These Unigene sequences and their genomic positions were combined with the
previously identified B73 gene models. We eliminated probe sets that cross-hybridize or
are redundant. For redundant collections of probe sets, a single probe set was retained
according to the following criteria: 1) has best alignment, 2) matches more transcripts of
the gene than other probe sets, and 3) has highest maximum expression.
Creating modules of coexpressed genes
24
All module construction was performed with WGCNA software (Langfelder and
Horvath, 2008). A correlation network is fully specified by its adjacency matrix that
contains the network connection strength between each gene pair. All 34,876 probe sets
were analyzed as a single block. To calculate the adjacency matrix, we first calculated the
Pearson correlation coefficient (r) between each pair of probe sets across all
developmental time points. The adjacency of two genes is proportional to the absolute
value of their correlation coefficients, e.g.:
( ) β21 ijij sa +=
Where aij is the adjacency value of gene i and gene j, sij is the Pearson correlation
between gene i and gene j, and β is the weight. This coexpression similarity measure
preserves information about negative correlations. The weight serves to highlight the
strongest correlations while reducing the mean connectivity, the average number of
connections per probe set, of the network (Figure S7). Weighted networks are robust with
respect to the choice of the power. Gene coexpression networks have been found to
exhibit a scale-free topology; their connections follow a power decay law such that a
small number of very highly connected nodes exist (Barabasi and Oltvai, 2004; Chung et
al., 2006). Using the scale-free topology criterion as described by (Langfelder and
Horvath, 2008), we selected a β value of 5 (Figure S7).
We used the topological overlap measure (TO) to transform the adjacency matrix to a
coexpression distance matrix using WGCNA in R. While a correlation considers each
pair of genes in isolation, topological overlap considers each pair of genes in relation to
all other genes in the network (Ravasz et al., 2002; Li and Horvath, 2007; Yip and
25
Horvath, 2007). Two genes have a high TO if they share high correlations with a
common set of other genes. The use of TO filters out spurious or isolated connections
(Oldham et al., 2008). Network relationships among genes were identified using
hierarchical clustering of the dissimilarity matrix (i.e. one minus the coexpression
distance matrix). A dynamic tree-cutting algorithm was used to "cut" each dendrogram
and define the modules. The tree-cutting algorithm iteratively decomposes and combines
branches until a stable number of clusters is reached (Langfelder and Horvath, 2008). A
summary profile, or eigengene, was calculated for each module by performing principle
component analysis for each module (Langfelder and Horvath, 2007). The first principle
component of the gene expression matrix for each module was retained as the
representative module eigengene (ME). Forty-nine coexpression modules resulted.
We validated the 49 modules by examining the average TO for all genes in the module.
The mean TOs of identified modules should be significantly higher than the TOs of
modules comprised of randomly selected genes. We calculated the average TOs of
modules comprised of randomly selected genes by randomly assigning the 34,876 genes
to 49 modules that were the same sizes as the observed modules. This process was
repeated 50,000 times to obtain 49 null distributions. The probability a random set of
genes could generate a TO greater or equal to the observed TO values for a module is the
fraction of 50,000 iterations where the random group of genes had a higher mean TO than
the observed mean TO. Twenty-four of the 49 modules were verified as highly significant
(P<10-5; Table S3).
26
Testing tissue specific transcript abundance
To test if modules were associated with preferential expression in distinct tissues, arrays
arising from the same tissue at different developmental stages were classified together
(e.g. leaf, root, shoot, etc.; Table I). We created a binary indicator variable (tissue = 1; all
other samples = 0) and determined if any module eigengenes were significantly
correlated with the indicator. Positive correlation between a module eigengene and a
tissue type indicates that probe sets in that module have high transcript levels in that
tissue relative to all other tissues. Negative correlation between a module eigengene and a
tissue type indicates that probe sets in that module have low transcript levels in that tissue
relative to all other tissues. Using the eigengene is similar to averaging the correlations
between the expression profiles for each gene in the module with the tissue type, but
avoids the multiple testing problem. Because modules have varying extents of
heterogeneity in gene expression, not all modules are represented equally well by the
ME. The module membership (MM) for each gene within a module is the Pearson
correlation between the expression level of the gene and the module eigengene (Horvath
and Dong, 2008). MM is a quantitative measure of the degree to which a gene is central
to a module.
We used a similar approach to identify modules specific to tissues at different
developmental stages. We divided leaf tissues into two groups, leaves harvested prior to
anthesis and silking (V1, V2, and V5), and leaves harvested after flowering (R1 and 10,
17, 24, and 31 DAP; Table I). We assigned all samples a value of zero except those from
the group under consideration, which were assigned one, and we performed module
27
correlations as described above. To investigate modules that correlate with embryo
development, we assigned non-embryo tissues zero, and embryo samples were assigned
either 10, 17, 24, or 31, based on the number of days between pollination and the day
they were sampled. Correlations were performed as above.
GO enrichment analysis
GO enrichment analysis of modules was performed using the “topGO” module in R. We
used GO annotations derived from BLAST2GO (Conesa et al., 2005), using the cDNA
sequences used for the design of the Affymetrix microarray platform and the NCBI ‘nr’
database (July 16, 2009). 17,139 genes were assigned 2,684 unique GO terms. For each
module, Fisher’s Exact Test was used to identify GO terms that occur more frequently
than expected given the frequency of the GO terms among all of the genes in the analysis.
The elim method was applied to remove higher level GO terms from probe sets with
significant lower-level annotations, which has been shown to reduce the rate of false
positives (Alexa et al., 2006). Within each module, we selected genes with a module
membership greater than 0.5 as these genes most resemble the module eigengene. We
defined a GO biological process as significantly associated with a module it if had a P
value less than 0.001. This strict criterion was used to eliminate GO terms present 1 or 2
times within a module but that nonetheless were highly significant because of the low
frequency of genes with that GO term within the data set.
The module membership of transcription-related genes
28
To investigate the centrality of transcription factors within modules, probe sets in each
module were ordered by their module membership (MM) score. Of 17,139 genes with
GO annotations, we identified 1,063 genes with transcription-related GO terms (e.g.
“transcription activator activity”, “transcription cofactor activity”, “transcription
initiation, DNA-dependent”, etc.; Table S6). The rank of the transcription-related
annotated gene with the highest MM was recorded. To determine whether this rank was
higher than expected, we compared each module’s rank to a distribution of the highest
rank of transcription related gene obtained by randomizing the order of genes within the
module for 100,000 iterations. We also permuted the order of a random set of genes with
the same size as the module, and determined the highest rank of the transcription related
gene in this data set.
Analysis of promoter motif enrichment within modules
To determine whether modules were enriched for genes containing particular cis-
promoter motifs, we counted the number of genes within each module that contained
each of 106 promoter motifs obtained from plantCARE (Rombauts et al., 1999), PLACE
(Higo et al., 1999), and GRASSIUS (Yilmaz et al., 2009). All motifs have been reported
to be transcription factor binding sites in maize. This analysis was limited to the 13,047
genes that had been mapped to the filtered set of maize gene models. We used Fisher’s
exact test with a critical value of less than 0.01 to compare the number of genes in the
module that contain the promoter sequence within 500 bp upstream of the transcription
start site with the number of genes not in the module with that sequence. Transcription
start sites were determined based on the transcript that mapped most upstream in cases
29
where multiple transcripts were annotated. We also examined promoter regions sized
1000bp, 1500bp and 2000bp.
Physical clustering of module genes
To determine whether the genes in our modules were clustered in the maize genome, we
compared the observed dispersion of gene density with the expected dispersion of gene
density for each module. The dispersion statistic (mean divided by variance) was
calculated by counting module genes in 300kb sliding windows with a step size of 100kb
and recording the mean gene density and the variance of gene density. To calculate the
expected dispersion statistic, we randomly shuffled the module assignments of genes
100,000 times and determined a null distribution by recording the dispersion statistic for
each permuted data set. This procedure was applied to the 12 modules that had at least
200 genes in the module for which the genomic position was known.
ACKNOWLEDGEMENTS
This work was made possible the facilities of the Shared Hierarchical Academic Research
Computing Network (SHARCNET:www.sharcnet.ca) and Compute/Calcul Canada.
30
LITERATURE CITED
Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600-1607
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410
Barabasi A-L, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5: 101-113
Belanger FC, Leustek T, Chu B, Kriz AL (1995) Evidence for the thiamine biosynthetic pathway in higher-plant plastids and its developmental regulation. Plant Molecular Biology 29: 809-821
Beló A, Beatty MK, Hondred D, Fengler KA, Li B, Rafalski A (2010) Allelic genome structural variations in maize detected by array comparative genome hybridization. Theor Appl Genet 120: 355-367
Benedito VA, Torres-Jerez I, Murray JD, Andriankaja A, Allen S, Kakar K, Wandrey M, Verdier J, Zuber H, Ott T, Moreau S, Niebel A, Frickey T, Weiller G, He J, Dai X, Zhao PX, Tang Y, Udvardi MK (2008) A gene expression atlas of the model legume Medicago truncatula. The Plant Journal 55: 504-513
Bolduc N, Yilmaz A, Mejia-Guerra MK, Morohashi K, O'Connor D, Grotewold E, Hake S (2012) Unraveling the KNOTTED1 regulatory network in maize meristems. Genes Dev 26: 1685-1690
Brady SM, Zhang L, Megraw M, Martinez NJ, Jiang E, Yi CS, Liu W, Zeng A, Taylor-Teeples M, Kim D, Ahnert S, Ohler U, Ware D, Walhout AJM, Benfey PN (2011) A stele-enriched gene regulatory network in the Arabidopsis root. Mol Syst Biol 7: 459
Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, van Sluis P, Hermus M-C, vab Asperen R, Boon K, Voûte PA, Heisterkamp S, van Kampen A, Versteeg R (2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291: 1289-1292
Chung W-Y, Albert R, Albert I, Nekrutenko A, Makova K (2006) Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network. BMC Bioinformatics 7: 46
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674-3676
Druka A, Muehlbauer G, Druka I, Caldo R, Baumann U, Rostoks N, Schreiber A, Wise R, Close T, Kleinhofs A, Graner A, Schulman A, Langridge P, Sato K, Hayes P, McNicol J, Marshall D, Waugh R (2006) An atlas of gene expression from seed to seed through barley development. Functional & Integrative Genomics 6: 202-211
Ficklin SP, Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice. Plant Physiol 156: 1244-1256
31
Ficklin SP, Luo F, Feltus FA (2010) The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks. Plant Physiology 154: 13-24
Fierro AC, Vandenbussche F, Engelen K, Van de Peer Y, Marchal K (2008) Meta analysis of gene expression data within and across species. Curr Genomics 9: 525-534
Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5: R80
Higo K, Ugawa Y, Iwamoto M, Korenaga T (1999) Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res 27: 297-300
Horvath S, Dong J (2008) Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol 4: e1000117
Inghelandt D, Reif JC, Dhillon BS, Flament P, Melchinger AE (2011) Extent and genome-wide distribution of linkage disequilibrium in commercial maize germplasm. Theoretical and Applied Genetics 123: 11-20
Jiao Y, Lori Tausta S, Gandotra N, Sun N, Liu T, Clay NK, Ceserani T, Chen M, Ma L, Holford M, Zhang H-y, Zhao H, Deng X-W, Nelson T (2009) A transcriptome atlas of rice cell types uncovers cellular, functional and developmental hierarchies. Nat Genet 41: 258-263
Kaufmann K, Pajoro A, Angenent GC (2010) Regulation of transcription in plants: mechanisms controlling developmental switches. Nat Rev Genet 11: 830-842
Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 1: 54
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9: 559
Lercher MJ, Urrutia AO, Hurst LD (2002) Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 31: 180-183
Li A, Horvath S (2007) Network neighborhood analysis with the multi-node topological overlap measure. Bioinformatics 23: 222-231
Luscombe NM, Madan Babu M, Yu H, Snyder M, Teichmann SA, Gerstein M (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431: 308-312
Lysenko E (2007) Plant sigma factors and their role in plastid transcription. Plant Cell Reports 26: 845-859
Ma L, Sun N, Liu X, Jiao Y, Zhao H, Deng XW (2005) Organ-specific expression of arabidopsis genome during development. Plant Physiology 138: 80-91
Mochida K, Uehara-Yamaguchi Y, Yoshida T, Sakurai T, Shinozaki K (2011) Global landscape of a co-expressed gene network in barley and its application to gene discovery in Triticeae crops. Plant and Cell Physiology 52: 785-803
Moreno-Risueno MA, Busch W, Benfey PN (2010) Omics meet networks -- using systems approaches to infer regulatory networks in plants. Current Opinion in Plant Biology 13: 126-131
32
Morishima A (1998) Identification of preferred binding sites of a light-inducible DNA-binding factor (MNF1) within 5′-upstream sequence of C4-type phosphoenolpyruvate carboxylase gene in maize. Plant Molecular Biology 38: 633-646
Morohashi K, Casas MI, Falcone Ferreyra L, Mejia-Guerra MK, Pourcel L, Yilmaz A, Feller A, Carvalho B, Emiliani J, Rodriguez E, Pellegrinet S, McMullen M, Casati P, Grotewold E (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes. Plant Cell 24: 2745-2764
Movahedi S, Van de Peer Y, Vandepoele K (2011) Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice. Plant Physiology
Mutwil M, Klie S, Tohge T, Giorgi FM, Wilkins O, Campbell MM, Fernie AR, Usadel Br, Nikoloski Z, Persson S (2011) PlaNet: combined sequence and expression comparisons across plant networks derived from seven species. The Plant Cell Online 23: 895-910
Ohad N, Margossian L, Hsu YC, Williams C, Repetti P, Fischer RL (1996) A mutation that allows endosperm development without fertilization. Proceedings of the National Academy of Sciences 93: 5319-5324
Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH (2008) Functional organization of the transcriptome in human brain. Nat Neurosci 11: 1271-1282
R Development Core Team (2010) R: A language and environment for statistical computing. In. R Foundation for Statistical Computing, Vienna, Austria.
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297: 1551-1555
Rombauts S, Dehais P, Van Montagu M, Rouze P (1999) PlantCARE, a plant cis-acting regulatory element database. Nucleic Acids Res 27: 295-296
Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B, Weigel D, Lohmann JU (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37: 501-506
Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, Chen W, Yan L, Higginbotham J, Cardenas M, Waligorski J, Applebaum E, Phelps L, Falcone J, Kanchi K, Thane T, Scimone A, Thane N, Henke J, Wang T, Ruppert J, Shah N, Rotter K, Hodges J, Ingenthron E, Cordes M, Kohlberg S, Sgro J, Delgado B, Mead K, Chinwalla A, Leonard S, Crouse K, Collura K, Kudrna D, Currie J, He R, Angelova A, Rajasekar S, Mueller T, Lomeli R, Scara G, Ko A, Delaney K, Wissotski M, Lopez G, Campos D, Braidotti M, Ashley E, Golser W, Kim H, Lee S, Lin J, Dujmic Z, Kim W, Talag J, Zuccolo A, Fan C, Sebastian A, Kramer M, Spiegel L, Nascimento L, Zutavern T, Miller B, Ambroise C, Muller S, Spooner W, Narechania A, Ren L, Wei S, Kumari S, Faga B, Levy MJ, McMahan L, Van Buren P, Vaughn MW, Ying K, Yeh C-T, Emrich SJ, Jia Y, Kalyanaraman A, Hsia A-P, Barbazuk WB, Baucom RS, Brutnell TP,
33
Carpita NC, Chaparro C, Chia J-M, Deragon J-M, Estill JC, Fu Y, Jeddeloh JA, Han Y, Lee H, Li P, Lisch DR, Liu S, Liu Z, Nagel DH, McCann MC, SanMiguel P, Myers AM, Nettleton D, Nguyen J, Penning BW, Ponnala L, Schneider KL, Schwartz DC, Sharma A, Soderlund C, Springer NM, Sun Q, Wang H, Waterman M, Westerman R, Wolfgruber TK, Yang L, Yu Y, Zhang L, Zhou S, Zhu Q, Bennetzen JL, Dawe RK, Jiang J, Jiang N, Presting GG, Wessler SR, Aluru S, Martienssen RA, Clifton SW, McCombie WR, Wing RA, Wilson RK (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112-1115
Sekhon RS, Lin H, Childs KL, Hansey CN, Buell CR, de Leon N, Kaeppler SM (2011) Genome-wide atlas of transcription during maize development. The Plant Journal 66: 553-563
Slater G, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31
Smaczniak C, Immink RG, Muino JM, Blanvillain R, Busscher M, Busscher-Lange J, Dinh QD, Liu S, Westphal AH, Boeren S, Parcy F, Xu L, Carles CC, Angenent GC, Kaufmann K (2012) Characterization of MADS-domain transcription factor complexes in Arabidopsis flower development. Proc Natl Acad Sci U S A 109: 1560-1565
Smith LG, Hake S, Sylvester AW (1996) The tangled-1 mutation alters cell division orientations throughout maize leaf development without altering leaf shape. Development 122: 481-489
Springer NM, Danilevskaya ON, Hermon P, Helentjaris TG, Phillips RL, Kaeppler HF, Kaeppler SM (2002) Sequence relationships, conserved domains, and expression patterns for maize homologs of the polycomb group genes E(z), esc, and E(Pc). Plant Physiology 128: 1332-1345
Springer NM, Ying K, Fu Y, Ji T, Yeh C-T, Jia Y, Wu W, Richmond T, Kitzman J, Rosenbaum H, Iniguez AL, Barbazuk WB, Jeddeloh JA, Nettleton D, Schnable PS (2009) Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet 5: e1000734
Sung S, Amasino RM (2004) Vernalization in Arabidopsis thaliana is mediated by the PHD finger protein VIN3. Nature 427: 159-164
Swanson-Wagner RA, Eichten SR, Kumari S, Tiffin P, Stein JC, Ware D, Springer NM (2010) Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Research
Udvardy A, Maine E, Schedl P (1985) The 87A7 chromomere: identification of novel chromatin structures flanking the heat shock locus that may define the boundaries of higher order domains. Journal of Molecular Biology 185: 341-358
Vicente-Carbajosa J, Moose SP, Parsons RL, Schmidt RJ (1997) A maize zinc-finger protein binds the prolamin box in zein gene promoters and interacts with the basic leucine zipper transcriptional activator Opaque2. Proceedings of the National Academy of Sciences 94: 7685-7690
Yilmaz A, Nishiyama MY, Fuentes BG, Souza GM, Janies D, Gray J, Grotewold E (2009) GRASSIUS: A Platform for Comparative Regulatory Genomics across the Grasses. Plant Physiology 149: 171-180
34
Yip AM, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 8: 22
Zhan S, Horrocks J, Lukens LN (2006) Islands of co-expressed neighbouring genes in Arabidopsis thaliana suggest higher-order chromosome domains. The Plant Journal 45: 347-357
Zhan S, Lukens L (2010) Identification of novel miRNAs and miRNA dependent developmental shifts of gene expression in Arabidopsis thaliana. PLoS One 5: e10157
Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40: 854-861
35
Figure Legends Figure 1 Images of selected tissues at time of sampling. Images of all 50 tissues can be found in Figure S6. Figure 2 Flow chart detailing the annotation of the array platform. Probe sets which were predicted to cross-hybridize, were redundant, or did not show expression were removed from the analysis. The maize transcriptome network was constructed from 34,876 probe sets, of which 12,089 were unmapped, and 22,787 were mapped to B73 maize gene models or unannotated regions of the maize B73 genome. Figure 3 Average transcript abundance from 50 arrays (three biological replicates of each tissue/stage) were clustered using the flashClust module from WGCNA. Developmental stages of the same tissue tend to cluster together. See Table I for definitions of developmental stages. Figure 4: A heat map of module eigengene and tissue correlations. Boxes contain Pearson correlation coefficients and their associated p-values. A strong positive correlation (red) indicates that the ME has higher expression in the given tissue relative to all other tissues. A strong negative correlation (blue) indicates low expression in the given tissue relative to all other tissues. Tissues were classified into groups as described in Table I. Figure 5: Module eigengene correlations with leaves sampled before and after anthesis and silking. All_Leaf_Stages represents all nine leaf samples. Vegetative leaves are the V1, V2, and V5 leaves. Reproductive leaves are samples from the second leaf above the top ear (R1 and 10, 17, 24, and 31 DAP). See Table I for definitions of developmental stages.
36
Table I: Description of developmental stages and tissues sampled for microarray analyses.
Developmental stage a
Tissue/ organ
Tissue group
Number of visible leaves at sampling
Detail of the harvested sample
VE leaf leaf 0 coleoptile VE seminal root root 0 root V1 leaf leaf 2 1st & 2nd leaf V1 seminal root root 2 root V2 seminal root root 4 seminal root V2 nodal root root 4 nodal root V2 stalk stalk 4 stalk V2 leaf leaf 4 leaf (actively growing leaf – 4th leaf) V4
tassel
tassel
6
1mm tassel meristem & 1mm uppermost stem below tassel
V5 seminal root root 8 seminal root V5 nodal root root 8 nodal root V5 stalk stalk 8 stalk below tassel (2cm) V5
leaf
leaf
8
leaf (actively growing leaf – 8th leaf, 15cm including tip)
V5 tassel tassel 8 tassel 3-5mm V7 ear ear 12 top ear shoot V7 tassel tassel 12 tassel 2cm
V8~V9 tassel tassel 13~14 tassel 12~14 cm V8~V9 ear ear 13~14 top ear 3~5mm
V10~V11 tassel tassel 15~16 top 10cm of tassel (~20cm) V10~V11 ear ear 15~16 top ear 1~1.5cm V13~V15 tassel tassel 15~16 spikelet of tassel (~22cm) V13~V15 ear ear 15~16 top ear 3~3.5cm V15~V16 floret floret 15~16 top ear (5cm) floret V15~V16 cob cob 15~16 top ear (5cm) cob V15~V16 silk silk 15~16 top ear (5cm) silk V15~V16 tassel tassel 15~16 spikelet of tassel (top 10cm)
VT anthers anthers 15~16 anther R1 ovule ovule 15~16 R1-ovule of top ear R1 cob cob 15~16 R1-cob of top ear R1 silk silk 15~16 R1-silk of top ear R1 husk husk 15~16 R1-most inner husk of top ear R1 leaf leaf 15~16 R1-15cm tip of 2nd leaf above top ear R1 nodal root root 15~16 R1-adult root R1 stalk stalk 15~16 R1-15cm stalk below tassel
5DAP ovule ovule 15~16 ovule of top ear 5DAP cob cob 15~16 cob of top ear 10DAP embryo embryo 15~16 embryo of top ear 10DAP endosperm endosperm 15~16 endosperm of top ear 10DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 17DAP embryo embryo 15~16 embryo of top ear 17DAP endosperm endosperm 15~16 endosperm of top ear 17DAP pericarp pericarp 15~16 pericarp of top ear
37
17DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 24DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 24DAP nodal root root 15~16 nodal root 24DAP pericarp pericarp 15~16 pericarp of top ear 24DAP embryo embryo 15~16 embryo of top ear 24DAP endosperm endosperm 15~16 endosperm of top ear 31DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 31DAP embryo embryo 15~16 embryo of top ear
a We also measured days after seeding for developmental stages (VE – 4, V1 – 7, V2 – 14, V4 – 17, V5 – 27, V7 – 34, V8-V9 – 41, V10 – 48, V15 – 51, V16 – 54, VT – 54). R1 – one day before pollination. All DAP (days after pollination) samples were harvested based on date of pollination. All samples were collected around noon. The VE stage is germination and emergence. The Vn leaf stage refers to when the collar of the nth leaf is visible.
38
Table II: Numbers of tissue-specific genes reported by Sekhon et al. (2011) that are classified into tissue-specific modules. Tissue a
Sekhon Genes b
Module Genes c
No. agree d
Proportion
Cob 4 0 0 --
Embryo 48 23 15 0.65
Endosperm 168 48 35 0.73
Internode 12 3 0 0.00
Leaf 334 109 92 0.84
Root 151 39 29 0.74
Silk 12 5 0 0.00
Tassel 134 49 35 0.71
Total 863 276 206 0.75
a Sekhon et al. 2011 reported tissue-specific genes in these eight tissues. b Count of genes reported by Sekhon et al. 2011 in each tissue. c Count of genes in the present study that were mapped to Sekhon et al.’s tissue-specific
genes. d Number of genes in the present study that were in tissue-specific modules that
corresponded to Sekhon et al.’s tissue.
39
Table III: Modules with over-represented promoter motifs.
Module p-value (x 10-3)a
Promoter Motif Zm_
mod01 b
Zm_
mod02
Zm_
mod04c
Zm_
mod06d
Zm_
mod10e
Zm_
mod23b
AAAG 6.471 AATAAA 6.482 CC(A/G)CCC 0.000 2.099 CCCCCG 0.525 0.003 CCCCGG 9.448 CGCGCC 0.021 GCCCCGG 0.744 TGGTTT 4.376 TTTAAAAA 8.545 (A/G)CCGAC 1.665 a p-values are calculated from Fisher’s Exact Test (α < 0.01) b module is negatively correlated with leaf tissue. c module is negatively correlated with leaf and positively correlated with ear. d module is positively correlated with leaf tissue. e module is positively correlated with root tissue.
40
Table IV: Dispersion scores of network modules with more than 200 genes.
Module
Number of genes
Mean a
Variance b
Dispersion (observed)
Mean dispersion
(randomized) p-value c
Zm_mod01 1839 0.2670 0.3143 0.8495 0.8790 0.0303 d Zm_mod02 1424 0.2066 0.2271 0.9100 0.9038 0.6475 Zm_mod03 1587 0.2304 0.2590 0.8895 0.8940 0.3899 Zm_mod04 1526 0.2216 0.2431 0.9116 0.8976 0.8105 Zm_mod05 641 0.0930 0.0983 0.9462 0.9545 0.3085 Zm_mod06 963 0.1398 0.1540 0.9075 0.9330 0.0657 Zm_mod07 807 0.1172 0.1263 0.9278 0.9432 0.1824 Zm_mod08 533 0.0774 0.0807 0.9590 0.9619 0.4353 Zm_mod09 580 0.0842 0.0871 0.9669 0.9587 0.6873 Zm_mod10 497 0.0721 0.0773 0.9319 0.9645 0.0322 d Zm_mod11 484 0.0703 0.0755 0.9307 0.9655 0.0269 d Zm_mod12 417 0.0604 0.0631 0.9564 0.9701 0.2068 Zm_mod13 251 0.0364 0.0376 0.9684 0.9819 0.2304 a the mean number of module genes within a 300kb region (100kb step value). b the variance of module gene counts within a 300kb region. c p-value – the probability of finding an equal or lower dispersion in a sample of 100,000
networks where genes are assigned to modules at random. d significance level: P<0.05
41
42
Supplemental Figure Legends Figure S1. 150 arrays (three biological replicates of each tissue/stage) were clustered using the flashClust module from WGCNA. Array replicates often form tight clusters, and developmental stages of the same tissue tend to cluster together. See Table I for definitions of developmental stages. Figure S2. A dendrogram and heatmap of module eigengene correlations. Figure S3. Pathways contained within the leaf-associated, Zm_mod06 module. Three near-complete pathways associated with photosynthesis are represented by genes in this module (chlorophyllide a biosynthesis I, phytyl diphosphate biosynthesis, and chlorophyll a biosynthesis II). Figure S4. A heat map of module eigengene and sample correlations. Boxes contain Pearson correlation coefficients and their associated p−values. A strong positive correlation (red) indicates that the ME has higher expression in the given tissue relative to all other tissues. A strong negative correlation (blue) indicates low expression in the given tissue relative to all other tissues. Figure S5. Expression of two genes in the embryo-associated Zm_mod13 module. A. Transcripts of the unannotated gene Zm028519_at are highly correlated with the ME. Note that the point at 0,0 represents no detectable expression of Zm028519_at in 46 non-embryo tissues. B. GRMZM2G074097 (“Thiazole biosynthetic enzyme 1-2, chloroplastic”) follows a pattern of increasing transcript abundance during embryo development. Figure S6. Images of 50 tissues at time of sampling. Figure S7. Soft-thresholding plots. A. Plot of scale independence with different weights. Plot of a range of values for β versus the fit of the resulting network to a scale-free topology. The horizontal line (r=0.75) represents best fit to a scale-free topology. B. Mean connectivity. Plot of a range of values for β versus the mean connectivity of the resulting network. As β increases the average number of connected nodes in the network decays.
43
Supplemental Tables Supplemental Table 1 Expression profiles across arrays for 34,876 probe sets. Supplemental Table 2 The 34,876 probe sets are annotated with module memberships. The table also notes if genes putatively encode a protein with a function in transcription (0 = No; 1 = Yes), Gene ontology annotations and probe sets that mapped to B73 gene models are also noted. Supplemental Table 3 Validation of modules based on mean topological overlap. Supplemental Table 4 The 276 tissue-specific genes reported by Sekhon et al. (2011) and their module memberships. Supplemental Table 5 Gene ontology terms over-represented within modules. Supplemental Table 6 Rank of the transcription related gene with the highest module membership (MM) in each module. Supplemental Table 7 Modules with over-represented promoter motifs, within promoter regions sized 1.0 kb, 1.5 kb, and 2.0 kb.
44
Supplemental Table 3: Validation of modules based on mean topological overlap. Modulea
mean TO observedb
mean TO randomc
p-valued
Zm_mod01 0.1548 0.0635 <2x10-5 Zm_mod02 0.1358 0.0635 <2x10-5 Zm_mod03 0.1106 0.0638 <2x10-5 Zm_mod04 0.1160 0.0635 <2x10-5 Zm_mod05 0.1382 0.0636 <2x10-5 Zm_mod06 0.1301 0.0637 <2x10-5 Zm_mod07 0.0946 0.0637 <2x10-5 Zm_mod08 0.0848 0.0638 <2x10-5 Zm_mod09 0.0838 0.0639 <2x10-5 Zm_mod10 0.0778 0.0640 <2x10-5 Zm_mod11 0.0856 0.0640 <2x10-5 Zm_mod12 0.1557 0.0645 <2x10-5 Zm_mod13 0.0720 0.0651 <2x10-5 Zm_mod14 0.0864 0.0649 <2x10-5 Zm_mod15 0.1242 0.0650 <2x10-5 Zm_mod16 0.0846 0.0675 <2x10-5 Zm_mod17 0.0943 0.0679 <2x10-5 Zm_mod18 0.0735 0.0687 0.0200 Zm_mod19 0.0963 0.0700 <2x10-5 Zm_mod20 0.1036 0.0714 <2x10-5 Zm_mod21 0.0900 0.0716 <2x10-5 Zm_mod22 0.0884 0.0764 <2x10-5 Zm_mod23 0.0971 0.0789 <2x10-5 Zm_mod24 0.1020 0.0815 <2x10-5 Zm_mod25 0.0621 0.0670 0.9802 Zm_mod26 0.0678 0.0662 0.1602 Zm_mod27 0.0563 0.0689 1.0000 Zm_mod28 0.0725 0.0682 0.0598 Zm_mod29 0.0728 0.0690 0.1196 Zm_mod30 0.0741 0.0697 0.0999 Zm_mod31 0.0643 0.0698 1.0000 Zm_mod32 0.0635 0.0704 1.0000 Zm_mod33 0.0667 0.0711 0.9001 Zm_mod34 0.0652 0.0705 0.9799 Zm_mod35 0.0697 0.0711 0.6198 Zm_mod36 0.0686 0.0724 0.8200 Zm_mod37 0.0757 0.0726 0.1404 Zm_mod38 0.0747 0.0758 0.5794 Zm_mod39 0.0672 0.0792 1.0000 Zm_mod40 0.0697 0.0790 0.9800 Zm_mod41 0.0920 0.0824 0.0597 Zm_mod42 0.0605 0.0815 1.0000 Zm_mod43 0.0835 0.0831 0.4004 Zm_mod44 0.0871 0.0838 0.1601
45
Zm_mod45 0.0788 0.0851 0.8801 Zm_mod46 0.0747 0.0930 1.0000 Zm_mod47 0.0786 0.0940 1.0000 Zm_mod48 0.0887 0.0926 0.7997 Zm_mod49 0.0815 0.0925 0.9800 a modules are ordered by size b the mean topological overlap score for the genes in the
module c the mean topological overlap score of 50,000 iterations d p-value – the probability of finding a greater or equal
TO in a sample of 50,000 collections of modules comprised of genes selected at random
46
47
Supplemental Table 6: Rank of the transcription related gene with the highest module membership (MM) in each module. Module
Module Size
Rank of Highest TF
p-value, Random Order a
p-value, Random Genes from Genome b
Zm_mod01 2301 2 0.1043 0.1110 Zm_mod02 2050 2 0.0995 0.1126 Zm_mod03 1920 17 0.6372 0.6365 Zm_mod04 1793 36 0.9367 0.8825 Zm_mod05 971 26 0.8468 0.7889 Zm_mod06 1259 74 0.9259 0.9880 Zm_mod07 916 13 0.6177 0.5418 Zm_mod08 728 5 0.2929 0.2584 Zm_mod09 655 1 0.0644 0.0577 Zm_mod10 579 47 0.9109 0.9410 Zm_mod11 607 14 0.6107 0.5667 Zm_mod12 538 25 0.5620 0.7771 Zm_mod13 338 12 0.7164 0.5122 Zm_mod14 192 12 0.4836 0.5124 Zm_mod15 152 5 0.0582 0.2589 Zm_mod16 156 12 0.5992 0.5100 Zm_mod17 67 1 0.0457 0.0580 Zm_mod18 74 9 0.4088 0.4150 Zm_mod19 66 2 0.1617 0.1134 Zm_mod20 45 21 0.9682 0.7141 Zm_mod21 47 5 0.5071 0.2588 Zm_mod22 26 1 0.1581 0.0583 Zm_mod23 27 22 0.8077 0.7317 Zm_mod24 20 1 0.1054 0.0590
a “Random order” values are the probability of obtaining an equal or higher rank when the genes in the module are placed in random order (based on 100,000 iterations). b “Random genes from genome” values are the probability of obtaining an equal or higher rank when [module size] genes selected randomly from all genes in the data set (based on 100,000 iterations). Complete list of transcription-related GO categories: GO:0000122 negative regulation of transcription from RNA polymerase II promoter GO:0000467 exonucleolytic trimming to generate mature 3'-end of 5.8S rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA) GO:0002103 endonucleolytic cleavage of tetracistronic rRNA transcript (SSU-rRNA, LSU-rRNA, 4.5S-rRNA, 5S-rRNA) GO:0003700 sequence-specific DNA binding transcription factor activity GO:0003702 RNA polymerase II transcription factor activity GO:0003711 transcription elongation regulator activity GO:0003712 transcription cofactor activity GO:0003713 transcription coactivator activity GO:0003714 transcription corepressor activity GO:0003715 transcription termination factor activity GO:0005667 transcription factor complex GO:0005669 transcription factor TFIID complex GO:0005672 transcription factor TFIIA complex
48
GO:0005673 transcription factor TFIIE complex GO:0005674 transcription factor TFIIF complex GO:0006283 transcription-coupled nucleotide-excision repair GO:0006350 transcription GO:0006351 transcription, DNA-dependent GO:0006352 transcription initiation, DNA-dependent GO:0006353 transcription termination, DNA-dependent GO:0006354 transcription elongation, DNA-dependent GO:0006355 regulation of transcription, DNA-dependent GO:0006357 regulation of transcription from RNA polymerase II promoter GO:0006366 transcription from RNA polymerase II promoter GO:0006367 transcription initiation from RNA polymerase II promoter GO:0006368 transcription elongation from RNA polymerase II promoter GO:0006383 transcription from RNA polymerase III promoter GO:0006410 transcription, RNA-dependent GO:0008023 transcription elongation factor complex GO:0008134 transcription factor binding GO:0008159 positive transcription elongation factor activity GO:0009303 rRNA transcription GO:0016251 general RNA polymerase II transcription factor activity GO:0016480 negative regulation of transcription from RNA polymerase III promoter GO:0016481 negative regulation of transcription GO:0016563 transcription activator activity GO:0016564 transcription repressor activity GO:0016566 specific transcriptional repressor activity GO:0016986 transcription initiation factor activity GO:0017163 basal transcription repressor activity GO:0030528 transcription regulator activity GO:0032583 regulation of gene-specific transcription GO:0032968 positive regulation of transcription elongation from RNA polymerase II promoter GO:0045449 regulation of transcription GO:0045892 negative regulation of transcription, DNA-dependent GO:0045941 positive regulation of transcription GO:0045944 positive regulation of transcription from RNA polymerase II promoter GO:0048096 chromatin-mediated maintenance of transcription
49
Supplemental Table 7: Modules with over-represented promoter motifs, within promoter regions sized 1.0kb, 1.5kb, and 2.0kb. promoter region =
1.0kb Module p-value (x 10-3)a
Promoter Motif Zm_ mod01
Zm_ mod02
Zm_ mod05
Zm_ mod08
Zm_ mod18
GTGCCC(A/T)(A/T) f 0.248 CC(A/G)CCC 1.076 GTGCCCTT f 8.348 AAAG 8.876 AATAAA 6.989 CCAAT 2.583 CC(A/T)ACC 3.046 TATATAT 0.955 CACGTC 7.571 CCCCCG 6.299 CGCGCC 0.343 CGTGG 2.828 GCCCCGG 7.889 a p-values are calculated from Fisher’s Exact Test (α < 0.01)
promoter region = 1.5kb
Module p-value (x 10-3)a
Promoter Motif Zm_ mod01
Zm_ mod02
Zm_ mod05
Zm_ mod15
Zm_ mod18
AAACCA CC(G/A)CCC 5.578 GTGCCCTT f 0.003 CC(A/G)CCC 8.464 CCAAT 8.740 CCCCGG CGCGCC 7.027 CGTGG 2.109 GCCCCGG 3.702 TATATAT 0.415 TGGTTT 5.790 a p-values are calculated from Fisher’s Exact Test (α < 0.01)
promoter region = 2.0kb
Module p-value (x 10-3)a
Promoter Motif Zm_ mod01
Zm_ mod02
Zm_ mod09
Zm_ mod13
Zm_ mod15
Zm_ mod21
AATAAA 0.007 CCCCGG 8.574 GCCCCGG 4.540 GTGCCC(A/T)(A/T) 9.896 TATATAT 2.156 TGAGTCA 6.459 TGGTTT 0.672 a p-values are calculated from Fisher’s Exact Test (α < 0.01)
50
Figure 1: Selected tissues at time of sampling. A. VE leaf and root; B. V1 leaf and root; C. V2 leaf, stalk, seminal and nodal root; D. V4 tassel; E. V4 tassel primordium; F. V5 tassel; G. V8 tassel and top ear shoot; H. V10 tassel and top ear shoot; I. V13 top ear; J. V15 silk; K. V15 cob and floret; L. V15 tassel; M VT anthers and pollen; N. R1 root; O. R1 stalk; P. R1 leaf; Q. 10DAP embryo and endosperm; R. 17DAP milky endosperm; S. 24DAP leaf; T. 24DAP root; U. 24DAP embryo, endosperm and pericarp; V. 31DAP leaf; W. 31DAP embryo, endosperm and pericarp.
A B C D E F
G H I J K L M
N O P Q R
S T U V W
82T665BprobeBsets4BBAreBtheyexpressed?
probeBsetsBwithBnoBexpression/26T989u
ExonerateBUnigenesBvs4BB73Bgenome
UnigenesBwhichBdoBnotBmatchBgenome/52T"89u
probeBsetsBwhichBcrosslhybridizeB/5"T65"u
probeBsetsBwithBnoBcrosslhybridization/32T973u
probeBsetsBwhichBmatchBtranscriptsfgenomeB
redundantBBprobeBsets/5"T586u
BLASTBprobeBsetsBvs4BB73Btranscripts
probeBsetsBusedinBanalysis/34T876u
defineBUnigenesBasB3geneBmodels3/9T959uB
defineBtranscriptsBasB3geneBmodels3/33T664u
identifiedBprobeBsets/22T787u
match33T664
noBmatch22T""8
match9T959
noBmatch52T"89
notBexpressed26T989
expressed55T672
Figurek2.kFlowkchartkdetailingkthekannotationkofkthekarraykplatform.kProbeksetskwhichkwerekpredictedktokcross-hybridize,kwerekredundant,korkdidknotkshowkexpressionkwerekremovedkfromkthekanalysisk.kThekmaizektranscriptomeknetworkkwaskconstructedkfromk34,876kprobeksets,kofkwhichk12,089kwerekunmapped,kandk22,787kwerekmappedktokB73kmaizekgenekmodelskorkunannotatedkregionskofkthekmaizekB73kgenome.
anth
ers_
VT
leaf
_V1
leaf
_V2
leaf
_V5
stal
k_R
1le
af_R
1le
af_2
4DA
Ple
af_3
1DA
Ple
af_1
0DA
Ple
af_1
7DA
Pen
dosp
erm
_17D
AP
endo
sper
m_2
4DA
Ppe
ricar
p_17
DA
Ppe
ricar
p_24
DA
Pem
bryo
_31D
AP
embr
yo_1
7DA
Pem
bryo
_24D
AP
embr
yo_1
0DA
Pen
dosp
erm
_10D
AP
tass
el_V
10ta
ssel
_V15
tass
el_V
16ro
ot_2
4DA
Pro
ot_R
1ro
ot_s
emin
al_V
2ro
ot_s
emin
al_V
5ro
ot_V
Ero
ot_V
1ro
ot_n
odal
_V2
root
_nod
al_V
5le
af_V
Est
alk_
V2
flore
t_V
16co
b_V
16ea
r_to
p_sh
oot_
V7
tass
el_V
7ea
r_V
8_V
9ea
r_V
10ea
r_V
15ta
ssel
_V4
tass
el_V
5si
lk_V
16si
lk_R
1hu
sk_R
1st
alk_
belo
w_t
asse
l_V
5ta
ssel
_V8_
V9
ovul
e_R
1co
b_R
1ov
ule_
5DA
Pco
b_5D
AP
200
300
400
500
600
700
800
Hei
ght
Figure 3. Average transcript abundance from 50 arrays (three biological replicates of each tissue/stage (see Table I)) were clustered using the flashClust module from WGCNA. Developmental stages of the same tissue tend to cluster together. See Table I for definitions of developmental stages.
−1
−0.5
0
0.5
1
Anthe
rsCob Ear
Embr
yo
Endos
perm
Floret
Husk
Leaf
Ovule
Perica
rpRoo
tSilk
Stalk
Tass
el
MEZm_mod13MEZm_mod18MEZm_mod23MEZm_mod02MEZm_mod19MEZm_mod03MEZm_mod01MEZm_mod04MEZm_mod21MEZm_mod22MEZm_mod07MEZm_mod16MEZm_mod10MEZm_mod11MEZm_mod20MEZm_mod24MEZm_mod05MEZm_mod17MEZm_mod08MEZm_mod09MEZm_mod12MEZm_mod15MEZm_mod06MEZm_mod14
−0.013(0.9)
−0.066(0.6)
−0.049(0.7)
0.86(6e−16)
−0.016(0.9)
−0.045(0.8)
−0.052(0.7)
−0.17(0.2)
−0.047(0.7)
0.067(0.6)
−0.16(0.3)
−0.053(0.7)
−0.082(0.6)
−0.12(0.4)
−0.011(0.9)
0.053(0.7)
0.11(0.4)
0.12(0.4)
0.14(0.3)
0.033(0.8)
−0.027(0.9)
−0.29(0.04)
0.074(0.6)
0.12(0.4)
0.071(0.6)
0.01(0.9)
−0.4(0.004)
0.1(0.5)
−0.0061(1)
0.053(0.7)
0.096(0.5)
0.083(0.6)
0.075(0.6)
0.044(0.8)
−0.0025(1)
−0.4(0.004)
0.047(0.7)
0.069(0.6)
0.066(0.7)
0.025(0.9)
0.0022(1)
0.068(0.6)
−0.99(7e−40)
0.05(0.7)
0.095(0.5)
0.076(0.6)
0.0064(1)
0.042(0.8)
0.0042(1)
0.022(0.9)
0.056(0.7)
0.044(0.8)
0.041(0.8)
0.027(0.8)
0.041(0.8)
0.041(0.8)
−0.022(0.9)
0.065(0.7)
0.086(0.6)
0.078(0.6)
0.067(0.6)
0.043(0.8)
0.018(0.9)
−0.39(0.005)
0.062(0.7)
0.052(0.7)
0.05(0.7)
0.038(0.8)
0.013(0.9)
0.081(0.6)
−0.13(0.4)
0.2(0.2)
0.18(0.2)
0.097(0.5)
−0.044(0.8)
0.11(0.5)
0.11(0.4)
−0.76(1e−10)
0.16(0.3)
0.036(0.8)
0.17(0.2)
0.14(0.3)
0.03(0.8)
0.18(0.2)
−0.47(6e−04)
0.11(0.4)
0.3(0.04)
0.23(0.1)
0.13(0.4)
0.13(0.4)
−0.034(0.8)
−0.58(9e−06)
0.14(0.3)
0.12(0.4)
0.0039(1)
0.044(0.8)
−0.038(0.8)
0.084(0.6)
−0.13(0.4)
0.16(0.3)
0.52(1e−04)
0.18(0.2)
−0.12(0.4)
0.24(0.09)
−0.043(0.8)
−0.44(0.001)
0.11(0.4)
−0.13(0.4)
−0.23(0.1)
0.058(0.7)
−0.052(0.7)
0.15(0.3)
−0.002(1)
−0.054(0.7)
−0.077(0.6)
−0.074(0.6)
−0.048(0.7)
−0.028(0.8)
0.0096(0.9)
−0.064(0.7)
−0.053(0.7)
−0.06(0.7)
−0.11(0.4)
−0.044(0.8)
−0.047(0.7)
0.51(1e−04)
−0.027(0.9)
−0.027(0.9)
−0.048(0.7)
−0.057(0.7)
−0.041(0.8)
−0.023(0.9)
0.06(0.7)
−0.055(0.7)
−0.021(0.9)
−0.041(0.8)
−0.062(0.7)
−0.032(0.8)
−0.038(0.8)
0.36(0.01)
0.046(0.8)
−0.069(0.6)
−0.39(0.005)
−0.37(0.007)
−0.31(0.03)
−0.16(0.3)
0.061(0.7)
0.44(0.001)
−0.051(0.7)
−0.11(0.5)
0.5(2e−04)
−0.048(0.7)
0.1(0.5)
−0.089(0.5)
−0.012(0.9)
0.1(0.5)
0.03(0.8)
−0.064(0.7)
−0.96(3e−27)
0.021(0.9)
0.075(0.6)
0.16(0.3)
0.077(0.6)
−0.05(0.7)
0.2(0.2)
0.047(0.7)
0.079(0.6)
0.11(0.4)
0.069(0.6)
0.12(0.4)
−0.34(0.02)
−0.32(0.03)
−0.24(0.09)
−0.099(0.5)
0.25(0.07)
−0.11(0.5)
0.074(0.6)
−0.065(0.7)
0.53(9e−05)
0.11(0.5)
0.14(0.3)
0.013(0.9)
0.034(0.8)
−0.12(0.4)
−0.18(0.2)
−0.14(0.3)
−0.11(0.4)
−0.096(0.5)
−0.041(0.8)
−0.19(0.2)
−0.1(0.5)
−0.087(0.5)
0.96(2e−29)
−0.074(0.6)
−0.057(0.7)
−0.17(0.2)
−0.014(0.9)
−0.092(0.5)
−0.12(0.4)
−0.13(0.4)
−0.11(0.4)
−0.063(0.7)
−0.039(0.8)
−0.14(0.3)
−0.073(0.6)
−0.077(0.6)
0.77(4e−11)
−0.063(0.7)
−0.043(0.8)
−0.14(0.3)
−0.01(0.9)
−0.045(0.8)
−0.051(0.7)
−0.038(0.8)
0.6(5e−06)
−0.025(0.9)
−0.026(0.9)
−0.071(0.6)
−0.033(0.8)
−0.013(0.9)
−0.073(0.6)
−0.035(0.8)
−0.038(0.8)
−0.064(0.7)
0.45(0.001)
−0.19(0.2)
−0.27(0.06)
−0.092(0.5)
0.45(0.001)
−0.12(0.4)
−0.036(0.8)
0.32(0.03)
−0.2(0.2)
0.13(0.4)
−0.093(0.5)
−0.038(0.8)
−0.026(0.9)
−0.21(0.1)
−0.015(0.9)
−0.046(0.8)
−0.05(0.7)
−0.023(0.9)
0.6(4e−06)
−0.023(0.9)
−0.026(0.9)
−0.075(0.6)
−0.04(0.8)
−0.002(1)
−0.08(0.6)
−0.033(0.8)
−0.035(0.8)
−0.069(0.6)
0.3(0.03)
−0.11(0.4)
−0.086(0.6)
−0.0021(1)
0.81(1e−12)
−0.045(0.8)
−0.058(0.7)
−0.11(0.4)
−0.091(0.5)
0.13(0.4)
−0.18(0.2)
−0.056(0.7)
−0.083(0.6)
−0.12(0.4)
−0.019(0.9)
−0.12(0.4)
−0.093(0.5)
0.43(0.002)
0.69(3e−08)
−0.054(0.7)
−0.081(0.6)
−0.31(0.03)
−0.049(0.7)
0.45(0.001)
−0.18(0.2)
−0.076(0.6)
−0.14(0.3)
−0.17(0.2)
0.55(4e−05)
−0.065(0.7)
−0.11(0.5)
−0.085(0.6)
−0.07(0.6)
−0.043(0.8)
−0.0025(1)
−0.13(0.4)
−0.042(0.8)
−0.048(0.7)
−0.099(0.5)
−0.036(0.8)
−0.067(0.6)
0.4(0.004)
0.38(0.007)
−0.086(0.6)
−0.11(0.5)
−0.1(0.5)
−0.04(0.8)
−0.056(0.7)
−0.043(0.8)
0.42(0.002)
−0.075(0.6)
−0.055(0.7)
−0.1(0.5)
−0.069(0.6)
−0.056(0.7)
−0.11(0.5)
−0.048(0.7)
−0.11(0.4)
−0.18(0.2)
−0.19(0.2)
−0.2(0.2)
−0.095(0.5)
0.0041(1)
0.83(6e−14)
−0.12(0.4)
−0.15(0.3)
−0.28(0.05)
−0.049(0.7)
0.19(0.2)
−0.053(0.7)
−0.29(0.04)
0.074(0.6)
0.17(0.2)
−0.1(0.5)
−0.34(0.01)
0.072(0.6)
0.044(0.8)
0.61(2e−06)
0.0069(1)
−0.24(0.1)
−0.58(9e−06)
0.061(0.7)
0.2(0.2)
0.089(0.5)
Figure 4. A heat map of module eigengene and tissue correlations. Boxes contain Pearson correlation coefficients and their associated p−values. A strong positive correlation (red) indicates that the ME has higher expression in the given tissue relative to all other tissues. A strong negative correlation (blue) indicates low expression in the given tissue relative to all other tissues. Tissues were classified into groups as described in Table I.
−1
−0.5
0
0.5
1
All_Le
af_S
tage
s
Vege
tativ
e_Le
aves
Repro
ducti
ve_L
eave
s
MEZm_mod13
MEZm_mod18
MEZm_mod23
MEZm_mod02
MEZm_mod19
MEZm_mod03
MEZm_mod01
MEZm_mod04
MEZm_mod21
MEZm_mod22
MEZm_mod07
MEZm_mod16
MEZm_mod10
MEZm_mod11
MEZm_mod20
MEZm_mod24
MEZm_mod05
MEZm_mod17
MEZm_mod08
MEZm_mod09
MEZm_mod12
MEZm_mod15
MEZm_mod06
MEZm_mod14
−0.16(0.3)
−0.09(0.5)
−0.12(0.4)
−0.29(0.04)
−0.44(0.001)
0.08(0.6)
−0.42(0.003)
−0.65(4e−07)
0.0097(0.9)
0.021(0.9)
−0.047(0.7)
0.061(0.7)
−0.42(0.003)
−0.65(4e−07)
−0.026(0.9)
−0.84(5e−14)
−0.37(0.009)
−0.81(7e−13)
−0.59(5e−06)
−0.66(2e−07)
−0.23(0.1)
−0.45(0.001)
−0.24(0.09)
−0.41(0.003)
−0.07(0.6)
−0.037(0.8)
−0.059(0.7)
−0.079(0.6)
−0.04(0.8)
−0.057(0.7)
0.44(0.001)
0.2(0.2)
0.41(0.003)
0.14(0.3)
0.048(0.7)
0.14(0.3)
−0.2(0.2)
0.036(0.8)
−0.34(0.02)
−0.18(0.2)
−0.072(0.6)
−0.17(0.2)
−0.13(0.4)
−0.08(0.6)
−0.077(0.6)
−0.064(0.7)
−0.032(0.8)
−0.048(0.7)
0.34(0.02)
0.52(1e−04)
−0.0067(1)
−0.068(0.6)
−0.02(0.9)
−0.066(0.7)
−0.098(0.5)
0.0078(1)
−0.12(0.4)
−0.29(0.04)
−0.17(0.2)
−0.21(0.2)
−0.13(0.4)
−0.062(0.7)
−0.096(0.5)
0.46(9e−04)
0.75(4e−10)
0.0094(0.9)
0.87(5e−16)
0.54(5e−05)
0.66(2e−07)
0.62(1e−06)
0.39(0.005)
0.44(0.002)
Figure 5. Module eigengene correlations with leaves sampled before and after anthesis and silking. All_Leaf_Stages represents all nine leaf samples. Vegetative leaves are the V1, V2, and V5 leaves. Reproductive leaves are samples from the second leaf above the top ear (R1 and 10, 17, 24, and 31 DAP). See Table I for definitions of developmental stages.
Top Related