Download - Running head: Transcriptional modules in maize development · Genes module assignments are given in Table S2. The modules range in size from 30 to 4,370 genes (mean 712, median 123).

1

Running head: Transcriptional modules in maize development Corresponding Author: Lewis Lukens Associate Professor Department of Plant Agriculture Crop Science Building University of Guelph 50 Stone Rd. E., Guelph, Ontario, N1G 2W1 Canada Phone: 519-824- 4120 x. 52304 Fax: 519-763- 8933 Email: [email protected] Research category: Genome Analysis

Plant Physiology Preview. Published on February 7, 2013, as DOI:10.1104/pp.112.213231

Copyright 2013 by the American Society of Plant Biologists

2

A developmental transcriptional network for Zea mays defines coexpression modules. Gregory S. Downs1, Yong-Mei Bi2, Joseph Colasanti2, Wenqing Wu2, Xi Chen3, Tong Zhu3, Steven J.Rothstein2, Lewis N. Lukens1 1 Department of Plant Agriculture; University of Guelph; Guelph, ON, Canada, N1G2W1 2 Department of Molecular and Cellular Biology; University of Guelph; Guelph, ON,

Canada, N1G2W1 3 Syngenta Biotechnology Inc.; 3054 Cornwallis Road, Research Triangle Park, NC,

USA, 27709

Summary: Through hierarchical clustering of transcript abundance data across a diverse

set of tissues and developmental stages in maize, we have identified a number of

coexpression modules which describe the transcriptional circuits of maize development.

Keywords:

Zea mays Development Plant Gene Expression Regulation Gene Regulatory Networks/genetics Maize Transcriptome Oligonucleotide Array Sequence Analysis Systems Biology Bioinformatics

3

Financial Source: This work was made possible by support from the Ontario Research Fund and Natural Sciences and Engineering Research Council of Canada. Corresponding Author: Lewis Lukens Associate Professor Crop Science Building Department of Plant Agriculture University of Guelph Guelph Campus 50 Stone Rd. E., Guelph, Ontario, Canada N1G 2W1 Phone: 519-824- 4120 x. 52304 Fax: 519-763- 8933 Email: [email protected]

4

ABSTRACT Here we present a genome-wide overview of transcriptional circuits in the agriculturally

significant crop species Zea mays. We examined transcript abundance data at 50

developmental stages, from embryogenesis to senescence, for 34,876 gene models and

classified genes into 24 robust coexpression modules. Modules were strongly associated

with tissue types and related biological processes. Sixteen of the 24 modules (67%) have

preferential transcript abundance within specific tissues. One-third of modules had an

absence of gene expression in specific tissues. Genes within a number of modules also

correlated with the developmental age of tissues. Coexpression of genes is likely due to

transcriptional control. For a number of modules, key genes involved in transcriptional

control have expression profiles that mimic the expression profiles of module genes,

although the expression of transcriptional control genes are not unusually representative

of module gene expression. Known regulatory motifs are enriched in several modules.

Finally, of the 13 network modules with more than 200 genes, three contain genes that

are notably clustered (p<0.05) within the genome. This work, based on a carefully

selected set of major tissues representing diverse stages of Zea mays development,

demonstrates the remarkable power of transcript-level coexpression networks to identify

underlying biological processes and their molecular components.

5

INTRODUCTION

Systems biology approaches recently have begun to elucidate the patterns of

transcriptome organization. In contrast to analyses that compare whole transcriptomes of

samples and those that compare mean levels of gene expression differences between

samples, the systems biology strategy integrates expression patterns of single genes to

infer their common biological function. Genes with coordinated expression across

samples are hypothesized to be co-regulated in response to external and internal cues and

to be regulated by similar transcription factors (Moreno-Risueno et al., 2010). Inferring

gene regulatory networks from transcriptome data and subsequently testing the attributes

of the network provides a system-wide view of developmental processes.

A number of studies have pooled diverse assortments of publicly available microarray

data to identify clusters of plant genes with shared patterns of expression (Fierro et al.,

2008; Ficklin et al., 2010; Mochida et al., 2011). A number of modules are conserved

across species (Ficklin and Feltus, 2011; Movahedi et al., 2011; Mutwil et al., 2011). For

example, modules associated with drought stress responses and cellulose biogenesis are

common to Hordeum vulgare, Arabidopsis thaliana and Brachypodium distachyon

(Mochida et al., 2011).

The great functional and morphological variation in plant tissue types arises from

differential regulation of a finite set of genomic transcripts. Microarray technology has

been used to compare gene transcript abundances between different tissues (Ma et al.,

2005; Schmid et al., 2005; Benedito et al., 2008; Jiao et al., 2009; Sekhon et al., 2011).

6

These studies have identified genes that are transcribed in specific organs and examined

the relationships among tissue expression patterns using principal components

transformation. These studies have also noted that similar tissues had more highly

correlated gene transcript abundances than less similar tissues; e.g., the correlation

coefficient of two developmental stages of leaf tissue is greater than the correlation

coefficient of leaf with another tissue.

Here, we have constructed a developmental gene expression network from microarray

transcriptome profiles of 50 Zea mays (maize) tissues across different stages of

development and identified modules of putative, co-regulated genes within this network.

We characterized the attributes of modules to begin to understand transcriptome

organization. Specifically, we investigated whether network modules are associated with

specific tissue types and are enriched for specific biological processes. Further, we

determined whether modules are specifically excluded from tissues, and if modules

reflect developmentally responsive processes. Moreover, we investigated the centrality of

transcription factors within modules, and if modules share common cis-regulatory motifs.

Finally, we determined whether modules contain genes that are clustered within the

genome. This work explores the gene expression network throughout maize development

for an inbred genotype grown under controlled conditions and describes the remarkably

discrete functionalities of modules within the network.

7

RESULTS

Gene networks of the maize developmental transcriptome

We set out to investigate the organization of the maize transcriptome throughout

development by analyzing a microarray data set generated from three biological

replicates of 50 tissue types (Table I). Samples were derived from all developmental

stages, from early embryo to senescence-stage leaves, including anthers, cob, ear,

embryo, endosperm, husk, leaf, ovule, pericarp, root, silk, stalk, and tassel. Several of

these tissues, including ear, leaf, and tassel were sampled at multiple stages of

development (Table I, Figure 1). All processed RNAs were hybridized to a custom

microarray with 82,661 probe-sets and 1,322,576 probes. Of these, 55,672 probe sets

were expressed in at least one tissue (Figure 2), and 33,664 mapped to the filtered gene

set of the maize gene models, release 4a.53 (Schnable et al., 2009). An additional 9,919

probe sets that were not annotated as genes mapped to the maize genome, and 12,089

probe sets did not match the genome using our criteria (Figure 2). To ensure the highest

level of data quality, among the probe sets that matched to the genome, we removed

redundant and non-specific probes prior to data analysis. In the end, we examined 34,876

probe-sets (Figure 2, Table S1). For clarity, we refer to these probe sets as genes.

We clustered all sample transcriptome profiles using the flashClust function of WGCNA

(Langfelder and Horvath, 2008) to obtain an overview of transcriptome relationships.

With few exceptions, biological replicate arrays cluster within a group containing only

the replicates of the tissue at that stage, and arrays from the same tissue cluster together

8

(Figure S1). Figure 3 shows a dendogram of all 50 tissues constructed from average

transcript abundances across replicate arrays. Distinct groups contain leaf, root, seed, and

silk expression profiles. Transcriptomes from tissues harvested at different developmental

stages largely cluster together, as do tissues that have strong developmental similarity

such as the V7 tassel and V7 ear. Nonetheless, some groups contain mixed tissues or do

not contain all arrays from a tissue type. For example, the R1 stalk is grouped with leaf

transcriptomes , and the pre-photosynthetic VE leaf did not cluster with other leaves

(Figure 3). The inner-most husk, a modified leaf, clustered with inflorescence tissues

(cob, ovule, silk, and tassel) rather than leaves (Figure 3).

We constructed a weighted gene coexpression network with the R software WGCNA by

transforming the 34,876 genes' pairwise Pearson correlation coefficients into a weighted

adjacency matrix (Langfelder and Horvath, 2008). We created a signed network, which

allows modules to contain both positively and negatively correlated genes since

transcripts involved in one process may be up- or down-regulated. The topological

overlap measure, or TO (Li and Horvath, 2007), was used to transform the adjacency

matrix into a coexpression distance matrix. Genes were clustered hierarchically, and a

dynamic tree-cutting algorithm cut the dendrogram and defined 49 modules. Genes

module assignments are given in Table S2. The modules range in size from 30 to 4,370

genes (mean 712, median 123). To validate modules, we compared the mean TO value

for each module to a distribution of mean TO values for 50,000 iterations of modules

composed of a randomly selected group of genes (Ravasz et al., 2002; Yip and Horvath,

2007). We focused on the 24 of the 49 modules that were validated as significant

9

(P<0.05; Table S3). These 24 modules contain 30,768 genes. Module eigengenes (ME)

were calculated for each module as the first principle component of the gene expression

matrix for the module, and these can be considered as a vector of gene expression values

characteristic of the module. Correlations between the MEs for each module indicate that

most modules have an eigengene with similar correlation patterns as the eigengene from

one or more other modules (Figure S2). The use of more permissive criteria for module

identification could group these modules together, but subsequent analyses revealed that

they have distinct attributes.

Many modules correlate with specific tissue types

We investigated whether each module's eigengene had significantly higher expression in

specific tissues relative to all other tissues. Tissue-specific modules may also contain

genes with low expression in one tissue type relative to others. Sixteen of the 24 modules

(67%) are moderately to highly correlated with tissue type (r>0.4; Figure 4). One or more

modules are correlated with anthers, ear, embryo, endosperm, leaf, pericarp, root, and

tassel. No module is correlated with cob, floret, husk, ovule, stalk, and silk (Figure 4). Of

the 24 modules, eight had eigengenes that are moderately to highly negatively correlated

(r<-0.4) with anthers, endosperm, leaf, root and stalk (Figure 4). Two of these eight

modules also had positive correlations with tissue types, so only two of the 24 modules

were not associated with a specific tissue type.

To investigate the robustness of tissue-associated modules, we cross-referenced genes'

modules with the list of tissue-specific genes reported by Sekhon et al. (2011) in a survey

10

of maize transcriptomes. Their study reported 863 tissue-specific genes, of which we

could trace 276 to the present experiment, due to differences between microarray

platforms and our stringent oligo to gene mapping criteria (Figure 2). Remarkably, 75%

(206) of the genes identified by Sekhon et al. as tissue specific were within network

modules that were significantly correlated with the same tissue (Table II, Table S4). The

70 (25%) tissue specific genes reported by Sekhon et al. that did not map to a module

may be, in part, explained by environmental effects that altered transcription profiles

between experiments. In our study all plants were grown in a controlled environment,

whereas Sekhon et al. (2011) harvested young plants from the greenhouse and older

plants from the field.

Modules are highly enriched for biological processes

Tissue specific modules are often characterized by specific biological functionalities. We

used the topGO package in R to identify Gene Ontology (GO) terms which appear in

modules more frequently than expected by chance. Nine of the 16 modules that positively

correlate with specific tissues are highly enriched with Gene Ontology biological

processes (Fisher's Exact Test, P<0.0001; Table S5). These modules are associated with

anther, ear, embryo, endosperm, leaf, root, pericarp, and tassel. The over-represented GO

terms are often consistent with known tissue attributes. Module Zm_mod12 is associated

with anthers (1010 genes, tissue correlation r=0.55, p=4 x 10-5) and is overrepresented by

genes related to “sexual reproduction” (GO:0019953) (Table S5). The module

Zm_mod11 (1109 genes, r=0.96, p=2 x 10-29) is correlated with roots and has an over-

representation of genes involved in “response to oxidative stress” (GO:0006979),

11

“oxidation reduction” (GO:0055114), and “hydrogen peroxide catabolic process”

(GO:0042744) (Table S5). A number of tissues are associated with different modules,

suggesting that functionally distinct modules can share tissue specificity. Leaves have

two modules with significant GO terms: Zm_mod06 and Zm_mod07. Zm_mod06 (2069

genes, r=0.83, p=6 x 10-14) is enriched for 6 GO biological process terms relating to

photosynthesis (Table S5). This module contains almost every enzyme in three pathways

important for chlorophyll biosynthesis: biosynthesis of chlorophyllide a, biosynthesis of

phytyl diphosphate, and biosynthesis of chlorophyll a (Figure S3). Phytyl diphosphate is

the source of the phytyl chain in chlorophylls, and chlorophyllide a is an intermediate

compound in chlorophyll biosynthesis. The biological process “glycolipid transport”

(GO: 0046836) is over-represented within module Zm_mod07 (1808 genes, r=0.44, p=1

x 10-3) (Table S5). Together, the annotations provide strong evidence of the

functionalities of many tissue-specific modules.

Four of the eight modules negatively correlated with a tissue type are significantly

enriched for GO biological process terms. “Glycine betaine biosynthetic process”

(GO:0031456) was enriched in module Zm_mod02 (4081 genes; r=−0.99, p=7 x 10-40)

which is negatively associated with anther tissue. Zm_mod01 is also negatively

associated with anthers (4370 genes; r=−0.47, p=6 x 10-4) and was enriched for

“translation” (GO:0006412) and “ribosome biogenesis” (GO:0003743) (Table S5). Genes

within modules that were neither positively nor negatively correlated with a tissue did not

have significant enrichment for any GO term (Figure 4, Table S5).

12

Modules capture developmental stages within tissues.

The relationship of leaf transcriptomes and to a smaller degree embryo transcriptomes

reflected the developmental age of the samples (Figure 3). Leaf samples include two

samples of juvenile leaves and one sample of adult leaf prior to flowering (V1, V2, and

V5; Table I). We also sampled the 2nd leaf above the top ear one day before pollination

and at four time points following pollination (10, 17, 24, and 31 DAP). A number of

modules are correlated or anticorrelated either with green leaves sampled prior to

anthesis/silking (V1, V2, and V5) or with green leaves sampled after flowering (R1 and

10, 17, 24, and 31DAP; Figure 5). Other modules, including Zm_mod06, enriched for

genes related to photosynthesis, have correlations with all leaf samples. Unlike leaves

before and after flowering, no modules differentiated the two juvenile stage leaves from

the single, adult leaf. A heatmap of module eigengene correlations with individual tissues

is shown in Figure S4.

The module eigengene for Zm_mod13 (691 genes) is very strongly correlated with

embryo age (10, 17, 24, or 31 DAP; r= 0.96, p=4 x 10-27). This module is enriched for the

GO term "embryonic development" (GO:0009790) (Table S5). Chloroplastic thiamine

thiazole synthase 2, THI1-2 (GRMZM2G074097), an enzyme in the thiamine

biosynthetic pathway, is within the Zm_mod13 module. The transcript abundance of thi1-

2 increases in embryos from 15 to 36 days after pollination (Belanger et al., 1995).

(Figure S5).

Expression of transcription-related genes within modules

13

Transcription of specific sets of genes triggers molecular cascades that determine

developmental fates (Kaufmann et al., 2010). Modules contain transcription related genes

that are correlated with module eigengenes. For example, the leaf-associated module

Zm_mod14 (315 genes; r=0.61, p=2 x 10-6) contains sigma factor SIG2A of RNA

polymerase (GRMZM2G143392) with an eigengene correlation, or module membership

(MM), of 0.74 (Table S2). The sigma factor is a nuclear-encoded gene whose product is

transported to the chloroplast where it facilitates plastid RNA polymerase (PEP) binding

to chloroplastic promoters, predominately in leaf tissue containing mature chloroplasts

(Lysenko, 2007). The eigengene of the endosperm-associated module Zm_mod09 (1403

genes; r=0.69, p=3 x 10-8) is highly correlated both with GRMZM2G118205, which

encodes a protein similar to the Polycomb group FIE1 (FERTILIZATION-

INDEPENDENT ENDOSPERM 1) protein, and GRMZM2G146283, which encodes a

PBF (prolamin box-binding factor) protein (MM=0.93 and MM=0.98, respectively).

Inheritance of a loss of function fie1 allele by the Arabidopsis thaliana female

gametophyte results in embryo abortion (Ohad et al., 1996), and expression of the maize

fie1 ortholog is restricted to embryo and endosperm tissue (Springer et al., 2002). PBF is

thought to activate the expression of prolamin seed storage protein encoding genes during

endosperm development by binding to the prolamin box motif (TGTAAAG) (Vicente-

Carbajosa et al., 1997).

We hypothesized that genes related to transcriptional control would have expression

patterns more highly similar to each module's eigengene than do other genes. In ten

modules, transcription related genes have one of the top five ranks when genes within the

14

module are sorted by descending MM (Table S6). Nonetheless, the top rank of the

transcription-related gene is not significantly higher than expected for other genes within

any module (p<0.01, Table S6). We also evaluated whether genes classified with GO

terms related to transcription have on average higher topological overlap scores than

expected. The observed connectivity of transcription-related genes was not significantly

greater than expected for any module (data not shown).

Coordinated regulation of module genes may be in part due to shared transcription factor

regulation. Transcription factors bind to specific promoter motifs upstream of the

transcription initiation site. We used Fisher’s Exact Test to determine if any one of 106

previously reported maize regulatory motifs is over-represented in the upstream

sequences of module genes. Six of the 24 modules are enriched for ten motifs (Table III).

An interesting motif is CC(A/G)CCC which is over-represented in Zm_mod01 and

Zm_mod06. These modules are negatively and positively correlated with leaf tissue,

respectively. The MNF1 (mitochondrial nucleoid factor 1) transcription factor is

associated with CC(A/G)CCC and initiates transcription of Ppc1 (C4-type

phosphoenolpyruvate carboxylase) in Zea mays mesophyll cells exposed to light

(Morishima, 1998). While plant promoters are often described as compact, there are

exceptions to this general rule. An examination of 1 kb, 1.5 kb, and 2 kb upstream

sequences identified some novel, over-represented promoter motifs within modules. Of

the ten motifs identified as over-represented in the 500bp upstream regulatory sequences

of certain modules, six were shared when 1 kb of upstream sequences was examined, and

four were shared when 2 kb was examined (Table S7).

15

Module genes are significantly clustered within the genome

Previous work has shown that transcript levels of physically proximate genes are, on

average, more highly correlated than expected by chance (Caron et al., 2001; Lercher et

al., 2002; Zhan et al., 2006). We investigated whether module genes tended to have non-

random genomic positions. Under the null hypothesis, the genomic position of a module's

gene in the genome is independent of all other module genes, and the distribution of

module genes per chromosomal interval is expected to follow a Poisson distribution with

an equal mean and variance. For the thirteen modules that contain more than 200 genes

(Table IV), we calculated the module dispersion score: the average number of module

member genes within a 300kb segment of chromosomal DNA divided by the variance.

Three of the twelve modules have genes that are significantly (P<0.05) clustered (Table

IV), although the mean number of genes per interval is lower than the variance for every

module. Two of these, Zm_mod10 (1147 genes, r=0.53, p=9 x 10-5) and Zm_mod11

(described above), are positively associated with roots (Figure 4). The other, Zm_mod01,

is negatively associated with leaf tissue.

DISCUSSION

The concept of modularity in transcriptome analyses is that transcript abundance data can

be partitioned into a collection of discrete and informative modules. Each module is self-

contained and presumably functions to perform a distinct task separate from the tasks of

other modules. At the same time, the components of a transcriptome are dynamically

interconnected, so a complex web of interactions defines transcript patterns and

16

abundances. To investigate the interconnections and modular structure of plant

developmental transcriptomes, we constructed a transcriptional network from a high

quality microarray data set derived from 50 Zea mays tissue types and developmental

stages that represent the range of maize morphogenesis and span developmental time

from embryogenesis to senescence. We clustered transcripts into a hierarchy with nested

modules of increasing sizes and decreasing interconnectedness, and we identified 49

modules, 24 of which have robust inter-connectivity.

With a custom-designed Affymetrix microarray chip to assay transcript levels we were

able to map 60% (33,664 of 55,672) of the probe-sets for which we detected target

hybridization to the high quality, filtered gene set originally reported by Schnable et al.

(2009, Figure 1). Similarly, Sekhon et al. (2011) designed probe sets for a Nimblegen

microarray with maize transcript assemblies and FGENESH gene models of the B73

genome sequence and found that about 70% of the probes matched filtered maize gene

models. The maize filtered gene set is a conservative list of maize genes, and of the

55,672 probe-sets for which we detected target hybridization, 9,919 (18%) map to the

B73 genome and do not map to the maize gene models (Figure 2). Twenty-two percent of

the expressed probes did not match the genome, perhaps because some probes arise from

mis-assembled unigenes. The maize genome also has gaps, and some transcripts may

have arisen from genes that were not sequenced. Finally, some probe sequences may be

derived from a transcribed gene that is absent from the B73 genome. Using comparative

genomic hybridization (CGH) on Nimblegen arrays, Springer et al. (2009) found

megabase-size B73 regions that contained genes and were absent in the maize inbred

17

Mo17 genome. Beló et al. (2010) and Swanson-Wagner et al. (2010) also have identified

thousands of potential copy-number variations (CNVs) among Zea mays genomes. After

eliminating cross-hybridizing and redundant probe sets, we identified 34,876 genes.

Network modules have strong associations with specific tissues and biological

processes

Modules are comprised of genes that have similar patterns of expression across all

tissues. Nonetheless, twenty-two of our 24 robust modules (92%) are characterized by

transcripts that are preferentially expressed or repressed within a specific tissue type

relative to all other tissue types (Figure 4). Our results indicate organ identity is a primary

factor that explains transcriptome variation throughout plant development and suggest

that organ identity is the key determinant of cellular function. This discovery echoes the

identification of numerous mutants that have aberrant cell structures but nonetheless

demonstrate normal organ development (Smith et al., 1996). Whole transcriptome

comparisons consistently show that age, cell type, and environmental stimuli have a

relatively minor effect on transcriptional profiles relative to organ type (Ma et al., 2005;

Schmid et al., 2005; Druka et al., 2006; Jiao et al., 2009; Sekhon et al., 2011). In addition,

large numbers of tissue specific transcripts have been identified in plants (Ma et al., 2005;

Druka et al., 2006; Sekhon et al., 2011). For example, of 18,481 detected transcripts from

barley, 650 were expressed in only a single tissue type (Druka et al., 2006). Seventy-five

percent of the maize tissue specific genes reported by Sekhon et al., (2011) that we

examined are found in the appropriate, tissue-specific modules. Although environmental

stimuli may rewire underlying network architectures (Luscombe et al., 2004), the

18

congruence of these results indicate that the rough outlines of developmental network

topologies are highly robust. Key expectations of tissue specific modules are first that

mutations within genes most highly connected, or central, to a tissue specific module will

have a phenotypic effect that preferentially affects that tissue. Second, as a number of

elicitor molecules drive organ change- for example, GA can induce floral feminization-

genes differentially expressed in response to elicitor treatment should be over-represented

in specific, developmental modules (Zhan and Lukens, 2010).

Maturation signals and switches in cell identity also activate distinct transcriptional

regulatory modules within a single organ. By examining leaf tissues of different ages, we

identified modules preferentially expressed in leaves prior to anthesis and silking and

modules preferentially expressed after flowering. We also identified one module

eigengene with a strong correlation with embryo age. The detection of modules

correlated with different stages of a single organ was possible because of the breadth of

the transcription profiles collected in this study.

We find a high congruence between modules’ associations with specific tissue types and

biological processes. Twelve of the 24 verified modules were enriched for genes involved

in specific biological processes, and all were positively or negatively associated with

specific tissue types. Ficklin and Feltus (2011) developed a maize transcriptome network

using 297 microarray datasets from various maize tissues and genotypes grown in a

number of conditions, including forty-eight arrays from pulvinus and nine arrays of

methylation filtered genomic DNA. They identified clusters of genes enriched for

19

ribosome and translation, seed storage activity, and photosynthesis (Ficklin and Feltus,

2011). It is likely that the robust signal of the different tissues within these samples

contributed to the functional enrichment of modules. Nonetheless, it would be interesting

to identify if modules clustered around functions independently of tissue type.

We expect that modules are a valuable resource for predicting gene function. Many of the

genes with high module membership have unknown functions. For these sparsely

annotated genes, their hub status in a particular tissue-specific module generates novel

hypotheses through the principle of guilt-by association. For example, the transcript

detected by probe set Zm028519_at is highly correlated with the Zm_mod13 eigengene

(MM=0.98) and with embryo age, with a similar expression profile in embryo to

GRMZM2G074097 (Figure S5) discussed previously. The Unigene sequence on which

the Zm028519_at probe set was based is homologous (blastn versus EST database, e-

value = 3 x 10-118) to an EST from a Sorghum bicolor embryo library. Nonetheless, the

transcript does not map to the maize filtered gene set, has no known functions, and is not

detected in any tissue other than embryo (Figure S5).

Transcription-related genes are correlated with module eigengenes but are not

notably central to modules.

Genes within a module are likely transcriptionally regulated. A number of transcription-

associated genes have high module memberships (Table S2). These genes include a

chloroplastic RNA polymerase protein, a chromatin remodeling enzyme, and a DOF

transcription factor. Nonetheless, transcription factors and other genes involved in

20

transcription were not unusually central to modules relative to genes with other functions

(Table S6). We propose three explanations for this observation. First, transcription factors

that are expressed in only one tissue may be rare. Transcription factors likely are active in

more than a single condition and can alter their regulatory interactions between

conditions (Luscombe et al., 2004; Brady et al., 2011). Second, precise regulation of

module genes may arise through the combinatorial protein interactions of transcription

factors (Smaczniak et al., 2012). Finally, transcription factors that direct organ cell

identity may act transiently to establish the identity early in development, and the present

study did not capture this time point. For example, heritable silencing of the expression of

transcription factor FLC (FLOWERING LOCUS C) is accomplished by transient

expression of VIN3 (vernalization-insensitive 3) that guides the heritable, epigenetic

modification of FLC (Sung and Amasino, 2004). We note that binding sites of

transcription factors can be over-represented among module genes. Six modules have a

significant over-representation of ten known promoter motifs (Table III). The genome

wide, verified, binding sites of key developmental transcription factors (Bolduc et al.,

2012; Morohashi et al., 2012) may further elucidate how genes within modules are co-

regulated.

Some module members are physically clustered in the maize genome.

Of the thirteen modules with more than 200 genes, genes within three modules are

significantly more co-located in chromosomal regions than expected by chance (Table

IV). Clusters of functionally related genes may arise because of a shared chromatin

environment that promotes their co-regulation (Udvardy et al., 1985). Alternatively,

21

epistasis among genes within modules seems feasible, as modules contain genes that act

in biochemical pathways (e.g. Figure S3). Selection may have favored linkage of epistatic

genes to reduce recombination between favorable alleles, thus contributing to the

variability in linkage disequilibrium decay across the maize genome (Inghelandt et al.,

2011).

Here, we begin to investigate factors that explain maize transcriptome variation across

development and the regulatory basis for that variation. Future work will improve the

resolution of the maize transcriptome modules by incorporating in-depth expression data

of diverse RNAs. The functional relationships among genes across development also will

likely be improved through integration of other data (Zhu et al., 2008). Finally, a major

objective will be to functionally characterize modules and to investigate how to alter

modules to drive developmental changes.

MATERIALS AND METHODS

Growth conditions and tissue sampling

An elite Syngenta Zea mays (maize) inbred (SRG200) was grown in a greenhouse at the

University of Guelph during the summer of 2007. Growth conditions were 16 hour days

(~600 μmol m-2 s-1) at 28˚C, 8 hour nights at 23˚C, and 50% relative humidity. Plants

were grown semi-hydroponically in pots containing Turface® clay, watered with a

modified Hoagland’s solution containing: 0.4 g/L 28-14-14 fertilizer, 0.4 g/L 15-15-30

fertilizer, 0.2 g/L NH4NO3, 0.4 g/L of MgSO4•7H2O and 0.03 g/L of micronutrient mix

(S, Co, Cu, Fe, Mn, Mo and Zn). Plant samples representing fifty developmental stages

22

were sampled for RNA extraction (Table I, Figure 1, Figure S6). Three biological

replicates per sample were harvested in the middle of the day to minimize complications

due to diurnal changes in C and N metabolism. Total RNA was isolated and used for

cDNA, cRNA synthesis and labeling followed a standard protocol recommended by

Affymetrix. Labeled cRNAs were fragmented and applied to a maize custom GeneChip

microarray for molecular hybridization. The array images with hybridization signals were

acquired and quantified by GeneChip Operation System (GCOS) software (Affymetrix).

The quality of the hybridization was assayed using Expressionist (GeneData).

Experiments were repeated for the arrays that failed to pass the quality assay.

Microarray attributes and data preparation

RNA was hybridized to a custom Affymetrix Unigene array with 82,662 probe sets, each

consisting of 16 probes of 25 nucleotides. The 150 CEL files were normalized using the

Robust Multichip Average (RMA) method from the “affy” library of the BioConductor

package (version 2.6) of the R statistical framework (version 2.11.0) (Gentleman et al.,

2004; R Development Core Team, 2010). Probe sets were removed from the data set if

the average probe set signal across three replicates was beneath the detection threshold

(log2(100)=6.64) in all 50 tissue types (26,989 probe sets). ANOVA was used to ensure

that the replicates did not significantly differ. Two arrays - anthers replicate 1 and V1 leaf

replicate 1 - were removed from the analysis. Replicates were then averaged.

The microarray platform was annotated by determining homology between probe

sequences and the cDNA sequences for predicted transcripts from the filtered gene set of

23

the 4a.53 release of the B73 genome using BLAST (blastn, Altschul et al., 1990). For a

probe to match a transcript, either at least 23 contiguous nucleotides out of 25 were

required to match, or 24 of 25 match with an internal gap. Probes that matched more than

14 nucleotides but less than 25 nucleotides and 85% identity were noted as close

matches. Close matches were used to identify cross-hybridization among transcripts, as

described below. If 12 of the 16 probes in a probe set matched the same transcript, the

probe set corresponded to that gene. If fewer than twelve but more than one probe in a

probe set was a match or a close match for the same gene, the probe set was identified as

a partial match for that gene.

In order to identify expressed transcripts that did not correspond to the filtered gene set of

gene models, we searched genomic DNA for the cDNA sequences from which the probes

were designed. Exonerate (Slater and Birney, 2005), an aligner that uses more exhaustive

heuristics than BLAST, was used to map the probe sets to genomic sequence. The “--

model est2genome” parameter was used, which allows for introns of reasonable length.

These Unigene sequences and their genomic positions were combined with the

previously identified B73 gene models. We eliminated probe sets that cross-hybridize or

are redundant. For redundant collections of probe sets, a single probe set was retained

according to the following criteria: 1) has best alignment, 2) matches more transcripts of

the gene than other probe sets, and 3) has highest maximum expression.

Creating modules of coexpressed genes

24

All module construction was performed with WGCNA software (Langfelder and

Horvath, 2008). A correlation network is fully specified by its adjacency matrix that

contains the network connection strength between each gene pair. All 34,876 probe sets

were analyzed as a single block. To calculate the adjacency matrix, we first calculated the

Pearson correlation coefficient (r) between each pair of probe sets across all

developmental time points. The adjacency of two genes is proportional to the absolute

value of their correlation coefficients, e.g.:

( ) β21 ijij sa +=

Where aij is the adjacency value of gene i and gene j, sij is the Pearson correlation

between gene i and gene j, and β is the weight. This coexpression similarity measure

preserves information about negative correlations. The weight serves to highlight the

strongest correlations while reducing the mean connectivity, the average number of

connections per probe set, of the network (Figure S7). Weighted networks are robust with

respect to the choice of the power. Gene coexpression networks have been found to

exhibit a scale-free topology; their connections follow a power decay law such that a

small number of very highly connected nodes exist (Barabasi and Oltvai, 2004; Chung et

al., 2006). Using the scale-free topology criterion as described by (Langfelder and

Horvath, 2008), we selected a β value of 5 (Figure S7).

We used the topological overlap measure (TO) to transform the adjacency matrix to a

coexpression distance matrix using WGCNA in R. While a correlation considers each

pair of genes in isolation, topological overlap considers each pair of genes in relation to

all other genes in the network (Ravasz et al., 2002; Li and Horvath, 2007; Yip and

25

Horvath, 2007). Two genes have a high TO if they share high correlations with a

common set of other genes. The use of TO filters out spurious or isolated connections

(Oldham et al., 2008). Network relationships among genes were identified using

hierarchical clustering of the dissimilarity matrix (i.e. one minus the coexpression

distance matrix). A dynamic tree-cutting algorithm was used to "cut" each dendrogram

and define the modules. The tree-cutting algorithm iteratively decomposes and combines

branches until a stable number of clusters is reached (Langfelder and Horvath, 2008). A

summary profile, or eigengene, was calculated for each module by performing principle

component analysis for each module (Langfelder and Horvath, 2007). The first principle

component of the gene expression matrix for each module was retained as the

representative module eigengene (ME). Forty-nine coexpression modules resulted.

We validated the 49 modules by examining the average TO for all genes in the module.

The mean TOs of identified modules should be significantly higher than the TOs of

modules comprised of randomly selected genes. We calculated the average TOs of

modules comprised of randomly selected genes by randomly assigning the 34,876 genes

to 49 modules that were the same sizes as the observed modules. This process was

repeated 50,000 times to obtain 49 null distributions. The probability a random set of

genes could generate a TO greater or equal to the observed TO values for a module is the

fraction of 50,000 iterations where the random group of genes had a higher mean TO than

the observed mean TO. Twenty-four of the 49 modules were verified as highly significant

(P<10-5; Table S3).

26

Testing tissue specific transcript abundance

To test if modules were associated with preferential expression in distinct tissues, arrays

arising from the same tissue at different developmental stages were classified together

(e.g. leaf, root, shoot, etc.; Table I). We created a binary indicator variable (tissue = 1; all

other samples = 0) and determined if any module eigengenes were significantly

correlated with the indicator. Positive correlation between a module eigengene and a

tissue type indicates that probe sets in that module have high transcript levels in that

tissue relative to all other tissues. Negative correlation between a module eigengene and a

tissue type indicates that probe sets in that module have low transcript levels in that tissue

relative to all other tissues. Using the eigengene is similar to averaging the correlations

between the expression profiles for each gene in the module with the tissue type, but

avoids the multiple testing problem. Because modules have varying extents of

heterogeneity in gene expression, not all modules are represented equally well by the

ME. The module membership (MM) for each gene within a module is the Pearson

correlation between the expression level of the gene and the module eigengene (Horvath

and Dong, 2008). MM is a quantitative measure of the degree to which a gene is central

to a module.

We used a similar approach to identify modules specific to tissues at different

developmental stages. We divided leaf tissues into two groups, leaves harvested prior to

anthesis and silking (V1, V2, and V5), and leaves harvested after flowering (R1 and 10,

17, 24, and 31 DAP; Table I). We assigned all samples a value of zero except those from

the group under consideration, which were assigned one, and we performed module

27

correlations as described above. To investigate modules that correlate with embryo

development, we assigned non-embryo tissues zero, and embryo samples were assigned

either 10, 17, 24, or 31, based on the number of days between pollination and the day

they were sampled. Correlations were performed as above.

GO enrichment analysis

GO enrichment analysis of modules was performed using the “topGO” module in R. We

used GO annotations derived from BLAST2GO (Conesa et al., 2005), using the cDNA

sequences used for the design of the Affymetrix microarray platform and the NCBI ‘nr’

database (July 16, 2009). 17,139 genes were assigned 2,684 unique GO terms. For each

module, Fisher’s Exact Test was used to identify GO terms that occur more frequently

than expected given the frequency of the GO terms among all of the genes in the analysis.

The elim method was applied to remove higher level GO terms from probe sets with

significant lower-level annotations, which has been shown to reduce the rate of false

positives (Alexa et al., 2006). Within each module, we selected genes with a module

membership greater than 0.5 as these genes most resemble the module eigengene. We

defined a GO biological process as significantly associated with a module it if had a P

value less than 0.001. This strict criterion was used to eliminate GO terms present 1 or 2

times within a module but that nonetheless were highly significant because of the low

frequency of genes with that GO term within the data set.

The module membership of transcription-related genes

28

To investigate the centrality of transcription factors within modules, probe sets in each

module were ordered by their module membership (MM) score. Of 17,139 genes with

GO annotations, we identified 1,063 genes with transcription-related GO terms (e.g.

“transcription activator activity”, “transcription cofactor activity”, “transcription

initiation, DNA-dependent”, etc.; Table S6). The rank of the transcription-related

annotated gene with the highest MM was recorded. To determine whether this rank was

higher than expected, we compared each module’s rank to a distribution of the highest

rank of transcription related gene obtained by randomizing the order of genes within the

module for 100,000 iterations. We also permuted the order of a random set of genes with

the same size as the module, and determined the highest rank of the transcription related

gene in this data set.

Analysis of promoter motif enrichment within modules

To determine whether modules were enriched for genes containing particular cis-

promoter motifs, we counted the number of genes within each module that contained

each of 106 promoter motifs obtained from plantCARE (Rombauts et al., 1999), PLACE

(Higo et al., 1999), and GRASSIUS (Yilmaz et al., 2009). All motifs have been reported

to be transcription factor binding sites in maize. This analysis was limited to the 13,047

genes that had been mapped to the filtered set of maize gene models. We used Fisher’s

exact test with a critical value of less than 0.01 to compare the number of genes in the

module that contain the promoter sequence within 500 bp upstream of the transcription

start site with the number of genes not in the module with that sequence. Transcription

start sites were determined based on the transcript that mapped most upstream in cases

29

where multiple transcripts were annotated. We also examined promoter regions sized

1000bp, 1500bp and 2000bp.

Physical clustering of module genes

To determine whether the genes in our modules were clustered in the maize genome, we

compared the observed dispersion of gene density with the expected dispersion of gene

density for each module. The dispersion statistic (mean divided by variance) was

calculated by counting module genes in 300kb sliding windows with a step size of 100kb

and recording the mean gene density and the variance of gene density. To calculate the

expected dispersion statistic, we randomly shuffled the module assignments of genes

100,000 times and determined a null distribution by recording the dispersion statistic for

each permuted data set. This procedure was applied to the 12 modules that had at least

200 genes in the module for which the genomic position was known.

ACKNOWLEDGEMENTS

This work was made possible the facilities of the Shared Hierarchical Academic Research

Computing Network (SHARCNET:www.sharcnet.ca) and Compute/Calcul Canada.

30

LITERATURE CITED

Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600-1607

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410

Barabasi A-L, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5: 101-113

Belanger FC, Leustek T, Chu B, Kriz AL (1995) Evidence for the thiamine biosynthetic pathway in higher-plant plastids and its developmental regulation. Plant Molecular Biology 29: 809-821

Beló A, Beatty MK, Hondred D, Fengler KA, Li B, Rafalski A (2010) Allelic genome structural variations in maize detected by array comparative genome hybridization. Theor Appl Genet 120: 355-367

Benedito VA, Torres-Jerez I, Murray JD, Andriankaja A, Allen S, Kakar K, Wandrey M, Verdier J, Zuber H, Ott T, Moreau S, Niebel A, Frickey T, Weiller G, He J, Dai X, Zhao PX, Tang Y, Udvardi MK (2008) A gene expression atlas of the model legume Medicago truncatula. The Plant Journal 55: 504-513

Bolduc N, Yilmaz A, Mejia-Guerra MK, Morohashi K, O'Connor D, Grotewold E, Hake S (2012) Unraveling the KNOTTED1 regulatory network in maize meristems. Genes Dev 26: 1685-1690

Brady SM, Zhang L, Megraw M, Martinez NJ, Jiang E, Yi CS, Liu W, Zeng A, Taylor-Teeples M, Kim D, Ahnert S, Ohler U, Ware D, Walhout AJM, Benfey PN (2011) A stele-enriched gene regulatory network in the Arabidopsis root. Mol Syst Biol 7: 459

Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, van Sluis P, Hermus M-C, vab Asperen R, Boon K, Voûte PA, Heisterkamp S, van Kampen A, Versteeg R (2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291: 1289-1292

Chung W-Y, Albert R, Albert I, Nekrutenko A, Makova K (2006) Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network. BMC Bioinformatics 7: 46

Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674-3676

Druka A, Muehlbauer G, Druka I, Caldo R, Baumann U, Rostoks N, Schreiber A, Wise R, Close T, Kleinhofs A, Graner A, Schulman A, Langridge P, Sato K, Hayes P, McNicol J, Marshall D, Waugh R (2006) An atlas of gene expression from seed to seed through barley development. Functional & Integrative Genomics 6: 202-211

Ficklin SP, Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice. Plant Physiol 156: 1244-1256

31

Ficklin SP, Luo F, Feltus FA (2010) The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks. Plant Physiology 154: 13-24

Fierro AC, Vandenbussche F, Engelen K, Van de Peer Y, Marchal K (2008) Meta analysis of gene expression data within and across species. Curr Genomics 9: 525-534

Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5: R80

Higo K, Ugawa Y, Iwamoto M, Korenaga T (1999) Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res 27: 297-300

Horvath S, Dong J (2008) Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol 4: e1000117

Inghelandt D, Reif JC, Dhillon BS, Flament P, Melchinger AE (2011) Extent and genome-wide distribution of linkage disequilibrium in commercial maize germplasm. Theoretical and Applied Genetics 123: 11-20

Jiao Y, Lori Tausta S, Gandotra N, Sun N, Liu T, Clay NK, Ceserani T, Chen M, Ma L, Holford M, Zhang H-y, Zhao H, Deng X-W, Nelson T (2009) A transcriptome atlas of rice cell types uncovers cellular, functional and developmental hierarchies. Nat Genet 41: 258-263

Kaufmann K, Pajoro A, Angenent GC (2010) Regulation of transcription in plants: mechanisms controlling developmental switches. Nat Rev Genet 11: 830-842

Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 1: 54

Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9: 559

Lercher MJ, Urrutia AO, Hurst LD (2002) Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 31: 180-183

Li A, Horvath S (2007) Network neighborhood analysis with the multi-node topological overlap measure. Bioinformatics 23: 222-231

Luscombe NM, Madan Babu M, Yu H, Snyder M, Teichmann SA, Gerstein M (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431: 308-312

Lysenko E (2007) Plant sigma factors and their role in plastid transcription. Plant Cell Reports 26: 845-859

Ma L, Sun N, Liu X, Jiao Y, Zhao H, Deng XW (2005) Organ-specific expression of arabidopsis genome during development. Plant Physiology 138: 80-91

Mochida K, Uehara-Yamaguchi Y, Yoshida T, Sakurai T, Shinozaki K (2011) Global landscape of a co-expressed gene network in barley and its application to gene discovery in Triticeae crops. Plant and Cell Physiology 52: 785-803

Moreno-Risueno MA, Busch W, Benfey PN (2010) Omics meet networks -- using systems approaches to infer regulatory networks in plants. Current Opinion in Plant Biology 13: 126-131

32

Morishima A (1998) Identification of preferred binding sites of a light-inducible DNA-binding factor (MNF1) within 5′-upstream sequence of C4-type phosphoenolpyruvate carboxylase gene in maize. Plant Molecular Biology 38: 633-646

Morohashi K, Casas MI, Falcone Ferreyra L, Mejia-Guerra MK, Pourcel L, Yilmaz A, Feller A, Carvalho B, Emiliani J, Rodriguez E, Pellegrinet S, McMullen M, Casati P, Grotewold E (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes. Plant Cell 24: 2745-2764

Movahedi S, Van de Peer Y, Vandepoele K (2011) Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice. Plant Physiology

Mutwil M, Klie S, Tohge T, Giorgi FM, Wilkins O, Campbell MM, Fernie AR, Usadel Br, Nikoloski Z, Persson S (2011) PlaNet: combined sequence and expression comparisons across plant networks derived from seven species. The Plant Cell Online 23: 895-910

Ohad N, Margossian L, Hsu YC, Williams C, Repetti P, Fischer RL (1996) A mutation that allows endosperm development without fertilization. Proceedings of the National Academy of Sciences 93: 5319-5324

Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH (2008) Functional organization of the transcriptome in human brain. Nat Neurosci 11: 1271-1282

R Development Core Team (2010) R: A language and environment for statistical computing. In. R Foundation for Statistical Computing, Vienna, Austria.

Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297: 1551-1555

Rombauts S, Dehais P, Van Montagu M, Rouze P (1999) PlantCARE, a plant cis-acting regulatory element database. Nucleic Acids Res 27: 295-296

Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B, Weigel D, Lohmann JU (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37: 501-506

Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, Chen W, Yan L, Higginbotham J, Cardenas M, Waligorski J, Applebaum E, Phelps L, Falcone J, Kanchi K, Thane T, Scimone A, Thane N, Henke J, Wang T, Ruppert J, Shah N, Rotter K, Hodges J, Ingenthron E, Cordes M, Kohlberg S, Sgro J, Delgado B, Mead K, Chinwalla A, Leonard S, Crouse K, Collura K, Kudrna D, Currie J, He R, Angelova A, Rajasekar S, Mueller T, Lomeli R, Scara G, Ko A, Delaney K, Wissotski M, Lopez G, Campos D, Braidotti M, Ashley E, Golser W, Kim H, Lee S, Lin J, Dujmic Z, Kim W, Talag J, Zuccolo A, Fan C, Sebastian A, Kramer M, Spiegel L, Nascimento L, Zutavern T, Miller B, Ambroise C, Muller S, Spooner W, Narechania A, Ren L, Wei S, Kumari S, Faga B, Levy MJ, McMahan L, Van Buren P, Vaughn MW, Ying K, Yeh C-T, Emrich SJ, Jia Y, Kalyanaraman A, Hsia A-P, Barbazuk WB, Baucom RS, Brutnell TP,

33

Carpita NC, Chaparro C, Chia J-M, Deragon J-M, Estill JC, Fu Y, Jeddeloh JA, Han Y, Lee H, Li P, Lisch DR, Liu S, Liu Z, Nagel DH, McCann MC, SanMiguel P, Myers AM, Nettleton D, Nguyen J, Penning BW, Ponnala L, Schneider KL, Schwartz DC, Sharma A, Soderlund C, Springer NM, Sun Q, Wang H, Waterman M, Westerman R, Wolfgruber TK, Yang L, Yu Y, Zhang L, Zhou S, Zhu Q, Bennetzen JL, Dawe RK, Jiang J, Jiang N, Presting GG, Wessler SR, Aluru S, Martienssen RA, Clifton SW, McCombie WR, Wing RA, Wilson RK (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112-1115

Sekhon RS, Lin H, Childs KL, Hansey CN, Buell CR, de Leon N, Kaeppler SM (2011) Genome-wide atlas of transcription during maize development. The Plant Journal 66: 553-563

Slater G, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31

Smaczniak C, Immink RG, Muino JM, Blanvillain R, Busscher M, Busscher-Lange J, Dinh QD, Liu S, Westphal AH, Boeren S, Parcy F, Xu L, Carles CC, Angenent GC, Kaufmann K (2012) Characterization of MADS-domain transcription factor complexes in Arabidopsis flower development. Proc Natl Acad Sci U S A 109: 1560-1565

Smith LG, Hake S, Sylvester AW (1996) The tangled-1 mutation alters cell division orientations throughout maize leaf development without altering leaf shape. Development 122: 481-489

Springer NM, Danilevskaya ON, Hermon P, Helentjaris TG, Phillips RL, Kaeppler HF, Kaeppler SM (2002) Sequence relationships, conserved domains, and expression patterns for maize homologs of the polycomb group genes E(z), esc, and E(Pc). Plant Physiology 128: 1332-1345

Springer NM, Ying K, Fu Y, Ji T, Yeh C-T, Jia Y, Wu W, Richmond T, Kitzman J, Rosenbaum H, Iniguez AL, Barbazuk WB, Jeddeloh JA, Nettleton D, Schnable PS (2009) Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet 5: e1000734

Sung S, Amasino RM (2004) Vernalization in Arabidopsis thaliana is mediated by the PHD finger protein VIN3. Nature 427: 159-164

Swanson-Wagner RA, Eichten SR, Kumari S, Tiffin P, Stein JC, Ware D, Springer NM (2010) Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Research

Udvardy A, Maine E, Schedl P (1985) The 87A7 chromomere: identification of novel chromatin structures flanking the heat shock locus that may define the boundaries of higher order domains. Journal of Molecular Biology 185: 341-358

Vicente-Carbajosa J, Moose SP, Parsons RL, Schmidt RJ (1997) A maize zinc-finger protein binds the prolamin box in zein gene promoters and interacts with the basic leucine zipper transcriptional activator Opaque2. Proceedings of the National Academy of Sciences 94: 7685-7690

Yilmaz A, Nishiyama MY, Fuentes BG, Souza GM, Janies D, Gray J, Grotewold E (2009) GRASSIUS: A Platform for Comparative Regulatory Genomics across the Grasses. Plant Physiology 149: 171-180

34

Yip AM, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 8: 22

Zhan S, Horrocks J, Lukens LN (2006) Islands of co-expressed neighbouring genes in Arabidopsis thaliana suggest higher-order chromosome domains. The Plant Journal 45: 347-357

Zhan S, Lukens L (2010) Identification of novel miRNAs and miRNA dependent developmental shifts of gene expression in Arabidopsis thaliana. PLoS One 5: e10157

Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40: 854-861

35

Figure Legends Figure 1 Images of selected tissues at time of sampling. Images of all 50 tissues can be found in Figure S6. Figure 2 Flow chart detailing the annotation of the array platform. Probe sets which were predicted to cross-hybridize, were redundant, or did not show expression were removed from the analysis. The maize transcriptome network was constructed from 34,876 probe sets, of which 12,089 were unmapped, and 22,787 were mapped to B73 maize gene models or unannotated regions of the maize B73 genome. Figure 3 Average transcript abundance from 50 arrays (three biological replicates of each tissue/stage) were clustered using the flashClust module from WGCNA. Developmental stages of the same tissue tend to cluster together. See Table I for definitions of developmental stages. Figure 4: A heat map of module eigengene and tissue correlations. Boxes contain Pearson correlation coefficients and their associated p-values. A strong positive correlation (red) indicates that the ME has higher expression in the given tissue relative to all other tissues. A strong negative correlation (blue) indicates low expression in the given tissue relative to all other tissues. Tissues were classified into groups as described in Table I. Figure 5: Module eigengene correlations with leaves sampled before and after anthesis and silking. All_Leaf_Stages represents all nine leaf samples. Vegetative leaves are the V1, V2, and V5 leaves. Reproductive leaves are samples from the second leaf above the top ear (R1 and 10, 17, 24, and 31 DAP). See Table I for definitions of developmental stages.

36

Table I: Description of developmental stages and tissues sampled for microarray analyses.

Developmental stage a

Tissue/ organ

Tissue group

Number of visible leaves at sampling

Detail of the harvested sample

VE leaf leaf 0 coleoptile VE seminal root root 0 root V1 leaf leaf 2 1st & 2nd leaf V1 seminal root root 2 root V2 seminal root root 4 seminal root V2 nodal root root 4 nodal root V2 stalk stalk 4 stalk V2 leaf leaf 4 leaf (actively growing leaf – 4th leaf) V4

tassel

tassel

6

1mm tassel meristem & 1mm uppermost stem below tassel

V5 seminal root root 8 seminal root V5 nodal root root 8 nodal root V5 stalk stalk 8 stalk below tassel (2cm) V5

leaf

leaf

8

leaf (actively growing leaf – 8th leaf, 15cm including tip)

V5 tassel tassel 8 tassel 3-5mm V7 ear ear 12 top ear shoot V7 tassel tassel 12 tassel 2cm

V8~V9 tassel tassel 13~14 tassel 12~14 cm V8~V9 ear ear 13~14 top ear 3~5mm

V10~V11 tassel tassel 15~16 top 10cm of tassel (~20cm) V10~V11 ear ear 15~16 top ear 1~1.5cm V13~V15 tassel tassel 15~16 spikelet of tassel (~22cm) V13~V15 ear ear 15~16 top ear 3~3.5cm V15~V16 floret floret 15~16 top ear (5cm) floret V15~V16 cob cob 15~16 top ear (5cm) cob V15~V16 silk silk 15~16 top ear (5cm) silk V15~V16 tassel tassel 15~16 spikelet of tassel (top 10cm)

VT anthers anthers 15~16 anther R1 ovule ovule 15~16 R1-ovule of top ear R1 cob cob 15~16 R1-cob of top ear R1 silk silk 15~16 R1-silk of top ear R1 husk husk 15~16 R1-most inner husk of top ear R1 leaf leaf 15~16 R1-15cm tip of 2nd leaf above top ear R1 nodal root root 15~16 R1-adult root R1 stalk stalk 15~16 R1-15cm stalk below tassel

5DAP ovule ovule 15~16 ovule of top ear 5DAP cob cob 15~16 cob of top ear 10DAP embryo embryo 15~16 embryo of top ear 10DAP endosperm endosperm 15~16 endosperm of top ear 10DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 17DAP embryo embryo 15~16 embryo of top ear 17DAP endosperm endosperm 15~16 endosperm of top ear 17DAP pericarp pericarp 15~16 pericarp of top ear

37

17DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 24DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 24DAP nodal root root 15~16 nodal root 24DAP pericarp pericarp 15~16 pericarp of top ear 24DAP embryo embryo 15~16 embryo of top ear 24DAP endosperm endosperm 15~16 endosperm of top ear 31DAP leaf leaf 15~16 15cm tip of 2nd leaf above top ear 31DAP embryo embryo 15~16 embryo of top ear

a We also measured days after seeding for developmental stages (VE – 4, V1 – 7, V2 – 14, V4 – 17, V5 – 27, V7 – 34, V8-V9 – 41, V10 – 48, V15 – 51, V16 – 54, VT – 54). R1 – one day before pollination. All DAP (days after pollination) samples were harvested based on date of pollination. All samples were collected around noon. The VE stage is germination and emergence. The Vn leaf stage refers to when the collar of the nth leaf is visible.

38

Table II: Numbers of tissue-specific genes reported by Sekhon et al. (2011) that are classified into tissue-specific modules. Tissue a

Sekhon Genes b

Module Genes c

No. agree d

Proportion

Cob 4 0 0 --

Embryo 48 23 15 0.65

Endosperm 168 48 35 0.73

Internode 12 3 0 0.00

Leaf 334 109 92 0.84

Root 151 39 29 0.74

Silk 12 5 0 0.00

Tassel 134 49 35 0.71

Total 863 276 206 0.75

a Sekhon et al. 2011 reported tissue-specific genes in these eight tissues. b Count of genes reported by Sekhon et al. 2011 in each tissue. c Count of genes in the present study that were mapped to Sekhon et al.’s tissue-specific

genes. d Number of genes in the present study that were in tissue-specific modules that

corresponded to Sekhon et al.’s tissue.

39

Table III: Modules with over-represented promoter motifs.

Module p-value (x 10-3)a

Promoter Motif Zm_

mod01 b

Zm_

mod02

Zm_

mod04c

Zm_

mod06d

Zm_

mod10e

Zm_

mod23b

AAAG 6.471 AATAAA 6.482 CC(A/G)CCC 0.000 2.099 CCCCCG 0.525 0.003 CCCCGG 9.448 CGCGCC 0.021 GCCCCGG 0.744 TGGTTT 4.376 TTTAAAAA 8.545 (A/G)CCGAC 1.665 a p-values are calculated from Fisher’s Exact Test (α < 0.01) b module is negatively correlated with leaf tissue. c module is negatively correlated with leaf and positively correlated with ear. d module is positively correlated with leaf tissue. e module is positively correlated with root tissue.

40

Table IV: Dispersion scores of network modules with more than 200 genes.

Module

Number of genes

Mean a

Variance b

Dispersion (observed)

Mean dispersion

(randomized) p-value c

Zm_mod01 1839 0.2670 0.3143 0.8495 0.8790 0.0303 d Zm_mod02 1424 0.2066 0.2271 0.9100 0.9038 0.6475 Zm_mod03 1587 0.2304 0.2590 0.8895 0.8940 0.3899 Zm_mod04 1526 0.2216 0.2431 0.9116 0.8976 0.8105 Zm_mod05 641 0.0930 0.0983 0.9462 0.9545 0.3085 Zm_mod06 963 0.1398 0.1540 0.9075 0.9330 0.0657 Zm_mod07 807 0.1172 0.1263 0.9278 0.9432 0.1824 Zm_mod08 533 0.0774 0.0807 0.9590 0.9619 0.4353 Zm_mod09 580 0.0842 0.0871 0.9669 0.9587 0.6873 Zm_mod10 497 0.0721 0.0773 0.9319 0.9645 0.0322 d Zm_mod11 484 0.0703 0.0755 0.9307 0.9655 0.0269 d Zm_mod12 417 0.0604 0.0631 0.9564 0.9701 0.2068 Zm_mod13 251 0.0364 0.0376 0.9684 0.9819 0.2304 a the mean number of module genes within a 300kb region (100kb step value). b the variance of module gene counts within a 300kb region. c p-value – the probability of finding an equal or lower dispersion in a sample of 100,000

networks where genes are assigned to modules at random. d significance level: P<0.05

42

Supplemental Figure Legends Figure S1. 150 arrays (three biological replicates of each tissue/stage) were clustered using the flashClust module from WGCNA. Array replicates often form tight clusters, and developmental stages of the same tissue tend to cluster together. See Table I for definitions of developmental stages. Figure S2. A dendrogram and heatmap of module eigengene correlations. Figure S3. Pathways contained within the leaf-associated, Zm_mod06 module. Three near-complete pathways associated with photosynthesis are represented by genes in this module (chlorophyllide a biosynthesis I, phytyl diphosphate biosynthesis, and chlorophyll a biosynthesis II). Figure S4. A heat map of module eigengene and sample correlations. Boxes contain Pearson correlation coefficients and their associated p−values. A strong positive correlation (red) indicates that the ME has higher expression in the given tissue relative to all other tissues. A strong negative correlation (blue) indicates low expression in the given tissue relative to all other tissues. Figure S5. Expression of two genes in the embryo-associated Zm_mod13 module. A. Transcripts of the unannotated gene Zm028519_at are highly correlated with the ME. Note that the point at 0,0 represents no detectable expression of Zm028519_at in 46 non-embryo tissues. B. GRMZM2G074097 (“Thiazole biosynthetic enzyme 1-2, chloroplastic”) follows a pattern of increasing transcript abundance during embryo development. Figure S6. Images of 50 tissues at time of sampling. Figure S7. Soft-thresholding plots. A. Plot of scale independence with different weights. Plot of a range of values for β versus the fit of the resulting network to a scale-free topology. The horizontal line (r=0.75) represents best fit to a scale-free topology. B. Mean connectivity. Plot of a range of values for β versus the mean connectivity of the resulting network. As β increases the average number of connected nodes in the network decays.

43

Supplemental Tables Supplemental Table 1 Expression profiles across arrays for 34,876 probe sets. Supplemental Table 2 The 34,876 probe sets are annotated with module memberships. The table also notes if genes putatively encode a protein with a function in transcription (0 = No; 1 = Yes), Gene ontology annotations and probe sets that mapped to B73 gene models are also noted. Supplemental Table 3 Validation of modules based on mean topological overlap. Supplemental Table 4 The 276 tissue-specific genes reported by Sekhon et al. (2011) and their module memberships. Supplemental Table 5 Gene ontology terms over-represented within modules. Supplemental Table 6 Rank of the transcription related gene with the highest module membership (MM) in each module. Supplemental Table 7 Modules with over-represented promoter motifs, within promoter regions sized 1.0 kb, 1.5 kb, and 2.0 kb.

44

Supplemental Table 3: Validation of modules based on mean topological overlap. Modulea

mean TO observedb

mean TO randomc

p-valued

Zm_mod01 0.1548 0.0635 <2x10-5 Zm_mod02 0.1358 0.0635 <2x10-5 Zm_mod03 0.1106 0.0638 <2x10-5 Zm_mod04 0.1160 0.0635 <2x10-5 Zm_mod05 0.1382 0.0636 <2x10-5 Zm_mod06 0.1301 0.0637 <2x10-5 Zm_mod07 0.0946 0.0637 <2x10-5 Zm_mod08 0.0848 0.0638 <2x10-5 Zm_mod09 0.0838 0.0639 <2x10-5 Zm_mod10 0.0778 0.0640 <2x10-5 Zm_mod11 0.0856 0.0640 <2x10-5 Zm_mod12 0.1557 0.0645 <2x10-5 Zm_mod13 0.0720 0.0651 <2x10-5 Zm_mod14 0.0864 0.0649 <2x10-5 Zm_mod15 0.1242 0.0650 <2x10-5 Zm_mod16 0.0846 0.0675 <2x10-5 Zm_mod17 0.0943 0.0679 <2x10-5 Zm_mod18 0.0735 0.0687 0.0200 Zm_mod19 0.0963 0.0700 <2x10-5 Zm_mod20 0.1036 0.0714 <2x10-5 Zm_mod21 0.0900 0.0716 <2x10-5 Zm_mod22 0.0884 0.0764 <2x10-5 Zm_mod23 0.0971 0.0789 <2x10-5 Zm_mod24 0.1020 0.0815 <2x10-5 Zm_mod25 0.0621 0.0670 0.9802 Zm_mod26 0.0678 0.0662 0.1602 Zm_mod27 0.0563 0.0689 1.0000 Zm_mod28 0.0725 0.0682 0.0598 Zm_mod29 0.0728 0.0690 0.1196 Zm_mod30 0.0741 0.0697 0.0999 Zm_mod31 0.0643 0.0698 1.0000 Zm_mod32 0.0635 0.0704 1.0000 Zm_mod33 0.0667 0.0711 0.9001 Zm_mod34 0.0652 0.0705 0.9799 Zm_mod35 0.0697 0.0711 0.6198 Zm_mod36 0.0686 0.0724 0.8200 Zm_mod37 0.0757 0.0726 0.1404 Zm_mod38 0.0747 0.0758 0.5794 Zm_mod39 0.0672 0.0792 1.0000 Zm_mod40 0.0697 0.0790 0.9800 Zm_mod41 0.0920 0.0824 0.0597 Zm_mod42 0.0605 0.0815 1.0000 Zm_mod43 0.0835 0.0831 0.4004 Zm_mod44 0.0871 0.0838 0.1601

45

Zm_mod45 0.0788 0.0851 0.8801 Zm_mod46 0.0747 0.0930 1.0000 Zm_mod47 0.0786 0.0940 1.0000 Zm_mod48 0.0887 0.0926 0.7997 Zm_mod49 0.0815 0.0925 0.9800 a modules are ordered by size b the mean topological overlap score for the genes in the

module c the mean topological overlap score of 50,000 iterations d p-value – the probability of finding a greater or equal

TO in a sample of 50,000 collections of modules comprised of genes selected at random

47

Supplemental Table 6: Rank of the transcription related gene with the highest module membership (MM) in each module. Module

Module Size

Rank of Highest TF

p-value, Random Order a

p-value, Random Genes from Genome b

Zm_mod01 2301 2 0.1043 0.1110 Zm_mod02 2050 2 0.0995 0.1126 Zm_mod03 1920 17 0.6372 0.6365 Zm_mod04 1793 36 0.9367 0.8825 Zm_mod05 971 26 0.8468 0.7889 Zm_mod06 1259 74 0.9259 0.9880 Zm_mod07 916 13 0.6177 0.5418 Zm_mod08 728 5 0.2929 0.2584 Zm_mod09 655 1 0.0644 0.0577 Zm_mod10 579 47 0.9109 0.9410 Zm_mod11 607 14 0.6107 0.5667 Zm_mod12 538 25 0.5620 0.7771 Zm_mod13 338 12 0.7164 0.5122 Zm_mod14 192 12 0.4836 0.5124 Zm_mod15 152 5 0.0582 0.2589 Zm_mod16 156 12 0.5992 0.5100 Zm_mod17 67 1 0.0457 0.0580 Zm_mod18 74 9 0.4088 0.4150 Zm_mod19 66 2 0.1617 0.1134 Zm_mod20 45 21 0.9682 0.7141 Zm_mod21 47 5 0.5071 0.2588 Zm_mod22 26 1 0.1581 0.0583 Zm_mod23 27 22 0.8077 0.7317 Zm_mod24 20 1 0.1054 0.0590

a “Random order” values are the probability of obtaining an equal or higher rank when the genes in the module are placed in random order (based on 100,000 iterations). b “Random genes from genome” values are the probability of obtaining an equal or higher rank when [module size] genes selected randomly from all genes in the data set (based on 100,000 iterations). Complete list of transcription-related GO categories: GO:0000122 negative regulation of transcription from RNA polymerase II promoter GO:0000467 exonucleolytic trimming to generate mature 3'-end of 5.8S rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA) GO:0002103 endonucleolytic cleavage of tetracistronic rRNA transcript (SSU-rRNA, LSU-rRNA, 4.5S-rRNA, 5S-rRNA) GO:0003700 sequence-specific DNA binding transcription factor activity GO:0003702 RNA polymerase II transcription factor activity GO:0003711 transcription elongation regulator activity GO:0003712 transcription cofactor activity GO:0003713 transcription coactivator activity GO:0003714 transcription corepressor activity GO:0003715 transcription termination factor activity GO:0005667 transcription factor complex GO:0005669 transcription factor TFIID complex GO:0005672 transcription factor TFIIA complex

48

GO:0005673 transcription factor TFIIE complex GO:0005674 transcription factor TFIIF complex GO:0006283 transcription-coupled nucleotide-excision repair GO:0006350 transcription GO:0006351 transcription, DNA-dependent GO:0006352 transcription initiation, DNA-dependent GO:0006353 transcription termination, DNA-dependent GO:0006354 transcription elongation, DNA-dependent GO:0006355 regulation of transcription, DNA-dependent GO:0006357 regulation of transcription from RNA polymerase II promoter GO:0006366 transcription from RNA polymerase II promoter GO:0006367 transcription initiation from RNA polymerase II promoter GO:0006368 transcription elongation from RNA polymerase II promoter GO:0006383 transcription from RNA polymerase III promoter GO:0006410 transcription, RNA-dependent GO:0008023 transcription elongation factor complex GO:0008134 transcription factor binding GO:0008159 positive transcription elongation factor activity GO:0009303 rRNA transcription GO:0016251 general RNA polymerase II transcription factor activity GO:0016480 negative regulation of transcription from RNA polymerase III promoter GO:0016481 negative regulation of transcription GO:0016563 transcription activator activity GO:0016564 transcription repressor activity GO:0016566 specific transcriptional repressor activity GO:0016986 transcription initiation factor activity GO:0017163 basal transcription repressor activity GO:0030528 transcription regulator activity GO:0032583 regulation of gene-specific transcription GO:0032968 positive regulation of transcription elongation from RNA polymerase II promoter GO:0045449 regulation of transcription GO:0045892 negative regulation of transcription, DNA-dependent GO:0045941 positive regulation of transcription GO:0045944 positive regulation of transcription from RNA polymerase II promoter GO:0048096 chromatin-mediated maintenance of transcription

49

Supplemental Table 7: Modules with over-represented promoter motifs, within promoter regions sized 1.0kb, 1.5kb, and 2.0kb. promoter region =

1.0kb Module p-value (x 10-3)a

Promoter Motif Zm_ mod01

Zm_ mod02

Zm_ mod05

Zm_ mod08

Zm_ mod18

GTGCCC(A/T)(A/T) f 0.248 CC(A/G)CCC 1.076 GTGCCCTT f 8.348 AAAG 8.876 AATAAA 6.989 CCAAT 2.583 CC(A/T)ACC 3.046 TATATAT 0.955 CACGTC 7.571 CCCCCG 6.299 CGCGCC 0.343 CGTGG 2.828 GCCCCGG 7.889 a p-values are calculated from Fisher’s Exact Test (α < 0.01)

promoter region = 1.5kb



Zm_ mod02

Zm_ mod05

Zm_ mod15

Zm_ mod18

AAACCA CC(G/A)CCC 5.578 GTGCCCTT f 0.003 CC(A/G)CCC 8.464 CCAAT 8.740 CCCCGG CGCGCC 7.027 CGTGG 2.109 GCCCCGG 3.702 TATATAT 0.415 TGGTTT 5.790 a p-values are calculated from Fisher’s Exact Test (α < 0.01)

promoter region = 2.0kb



Zm_ mod02

Zm_ mod09

Zm_ mod13

Zm_ mod15

Zm_ mod21

AATAAA 0.007 CCCCGG 8.574 GCCCCGG 4.540 GTGCCC(A/T)(A/T) 9.896 TATATAT 2.156 TGAGTCA 6.459 TGGTTT 0.672 a p-values are calculated from Fisher’s Exact Test (α < 0.01)

Figure 1: Selected tissues at time of sampling. A. VE leaf and root; B. V1 leaf and root; C. V2 leaf, stalk, seminal and nodal root; D. V4 tassel; E. V4 tassel primordium; F. V5 tassel; G. V8 tassel and top ear shoot; H. V10 tassel and top ear shoot; I. V13 top ear; J. V15 silk; K. V15 cob and floret; L. V15 tassel; M VT anthers and pollen; N. R1 root; O. R1 stalk; P. R1 leaf; Q. 10DAP embryo and endosperm; R. 17DAP milky endosperm; S. 24DAP leaf; T. 24DAP root; U. 24DAP embryo, endosperm and pericarp; V. 31DAP leaf; W. 31DAP embryo, endosperm and pericarp.

A B C D E F

G H I J K L M

N O P Q R

S T U V W

82T665BprobeBsets4BBAreBtheyexpressed?

probeBsetsBwithBnoBexpression/26T989u

ExonerateBUnigenesBvs4BB73Bgenome

UnigenesBwhichBdoBnotBmatchBgenome/52T"89u

probeBsetsBwhichBcrosslhybridizeB/5"T65"u

probeBsetsBwithBnoBcrosslhybridization/32T973u

probeBsetsBwhichBmatchBtranscriptsfgenomeB

redundantBBprobeBsets/5"T586u

BLASTBprobeBsetsBvs4BB73Btranscripts

probeBsetsBusedinBanalysis/34T876u

defineBUnigenesBasB3geneBmodels3/9T959uB

defineBtranscriptsBasB3geneBmodels3/33T664u

identifiedBprobeBsets/22T787u

match33T664

noBmatch22T""8

match9T959

noBmatch52T"89

notBexpressed26T989

expressed55T672

Figurek2.kFlowkchartkdetailingkthekannotationkofkthekarraykplatform.kProbeksetskwhichkwerekpredictedktokcross-hybridize,kwerekredundant,korkdidknotkshowkexpressionkwerekremovedkfromkthekanalysisk.kThekmaizektranscriptomeknetworkkwaskconstructedkfromk34,876kprobeksets,kofkwhichk12,089kwerekunmapped,kandk22,787kwerekmappedktokB73kmaizekgenekmodelskorkunannotatedkregionskofkthekmaizekB73kgenome.

anth

ers_

VT

leaf

_V1

leaf

_V2

leaf

_V5

stal

k_R

1le

af_R

1le

af_2

4DA

Ple

af_3

1DA

Ple

af_1

0DA

Ple

af_1

7DA

Pen

dosp

erm

_17D

AP

endo

sper

m_2

4DA

Ppe

ricar

p_17

DA

Ppe

ricar

p_24

DA

Pem

bryo

_31D

AP

embr

yo_1

7DA

Pem

bryo

_24D

AP

embr

yo_1

0DA

Pen

dosp

erm

_10D

AP

tass

el_V

10ta

ssel

_V15

tass

el_V

16ro

ot_2

4DA

Pro

ot_R

1ro

ot_s

emin

al_V

2ro

ot_s

emin

al_V

5ro

ot_V

Ero

ot_V

1ro

ot_n

odal

_V2

root

_nod

al_V

5le

af_V

Est

alk_

V2

flore

t_V

16co

b_V

16ea

r_to

p_sh

oot_

V7

tass

el_V

7ea

r_V

8_V

9ea

r_V

10ea

r_V

15ta

ssel

_V4

tass

el_V

5si

lk_V

16si

lk_R

1hu

sk_R

1st

alk_

belo

w_t

asse

l_V

5ta

ssel

_V8_

V9

ovul

e_R

1co

b_R

1ov

ule_

5DA

Pco

b_5D

AP

200

300

400

500

600

700

800

Hei

ght

Figure 3. Average transcript abundance from 50 arrays (three biological replicates of each tissue/stage (see Table I)) were clustered using the flashClust module from WGCNA. Developmental stages of the same tissue tend to cluster together. See Table I for definitions of developmental stages.

−1

−0.5

0

0.5

1

Anthe

rsCob Ear

Embr

yo

Endos

perm

Floret

Husk

Leaf

Ovule

Perica

rpRoo

tSilk

Stalk

Tass

el

MEZm_mod13MEZm_mod18MEZm_mod23MEZm_mod02MEZm_mod19MEZm_mod03MEZm_mod01MEZm_mod04MEZm_mod21MEZm_mod22MEZm_mod07MEZm_mod16MEZm_mod10MEZm_mod11MEZm_mod20MEZm_mod24MEZm_mod05MEZm_mod17MEZm_mod08MEZm_mod09MEZm_mod12MEZm_mod15MEZm_mod06MEZm_mod14

−0.013(0.9)

−0.066(0.6)

−0.049(0.7)

0.86(6e−16)

−0.016(0.9)

−0.045(0.8)

−0.052(0.7)

−0.17(0.2)

−0.047(0.7)

0.067(0.6)

−0.16(0.3)

−0.053(0.7)

−0.082(0.6)

−0.12(0.4)

−0.011(0.9)

0.053(0.7)

0.11(0.4)

0.12(0.4)

0.14(0.3)

0.033(0.8)

−0.027(0.9)

−0.29(0.04)

0.074(0.6)

0.12(0.4)

0.071(0.6)

0.01(0.9)

−0.4(0.004)

0.1(0.5)

−0.0061(1)

0.053(0.7)

0.096(0.5)

0.083(0.6)

0.075(0.6)

0.044(0.8)

−0.0025(1)

−0.4(0.004)

0.047(0.7)

0.069(0.6)

0.066(0.7)

0.025(0.9)

0.0022(1)

0.068(0.6)

−0.99(7e−40)

0.05(0.7)

0.095(0.5)

0.076(0.6)

0.0064(1)

0.042(0.8)

0.0042(1)

0.022(0.9)

0.056(0.7)

0.044(0.8)

0.041(0.8)

0.027(0.8)

0.041(0.8)

0.041(0.8)

−0.022(0.9)

0.065(0.7)

0.086(0.6)

0.078(0.6)

0.067(0.6)

0.043(0.8)

0.018(0.9)

−0.39(0.005)

0.062(0.7)

0.052(0.7)

0.05(0.7)

0.038(0.8)

0.013(0.9)

0.081(0.6)

−0.13(0.4)

0.2(0.2)

0.18(0.2)

0.097(0.5)

−0.044(0.8)

0.11(0.5)

0.11(0.4)

−0.76(1e−10)

0.16(0.3)

0.036(0.8)

0.17(0.2)

0.14(0.3)

0.03(0.8)

0.18(0.2)

−0.47(6e−04)

0.11(0.4)

0.3(0.04)

0.23(0.1)

0.13(0.4)

0.13(0.4)

−0.034(0.8)

−0.58(9e−06)

0.14(0.3)

0.12(0.4)

0.0039(1)

0.044(0.8)

−0.038(0.8)

0.084(0.6)

−0.13(0.4)

0.16(0.3)

0.52(1e−04)

0.18(0.2)

−0.12(0.4)

0.24(0.09)

−0.043(0.8)

−0.44(0.001)

0.11(0.4)

−0.13(0.4)

−0.23(0.1)

0.058(0.7)

−0.052(0.7)

0.15(0.3)

−0.002(1)

−0.054(0.7)

−0.077(0.6)

−0.074(0.6)

−0.048(0.7)

−0.028(0.8)

0.0096(0.9)

−0.064(0.7)

−0.053(0.7)

−0.06(0.7)

−0.11(0.4)

−0.044(0.8)

−0.047(0.7)

0.51(1e−04)

−0.027(0.9)

−0.027(0.9)

−0.048(0.7)

−0.057(0.7)

−0.041(0.8)

−0.023(0.9)

0.06(0.7)

−0.055(0.7)

−0.021(0.9)

−0.041(0.8)

−0.062(0.7)

−0.032(0.8)

−0.038(0.8)

0.36(0.01)

0.046(0.8)

−0.069(0.6)

−0.39(0.005)

−0.37(0.007)

−0.31(0.03)

−0.16(0.3)

0.061(0.7)

0.44(0.001)

−0.051(0.7)

−0.11(0.5)

0.5(2e−04)

−0.048(0.7)

0.1(0.5)

−0.089(0.5)

−0.012(0.9)

0.1(0.5)

0.03(0.8)

−0.064(0.7)

−0.96(3e−27)

0.021(0.9)

0.075(0.6)

0.16(0.3)

0.077(0.6)

−0.05(0.7)

0.2(0.2)

0.047(0.7)

0.079(0.6)

0.11(0.4)

0.069(0.6)

0.12(0.4)

−0.34(0.02)

−0.32(0.03)

−0.24(0.09)

−0.099(0.5)

0.25(0.07)

−0.11(0.5)

0.074(0.6)

−0.065(0.7)

0.53(9e−05)

0.11(0.5)

0.14(0.3)

0.013(0.9)

0.034(0.8)

−0.12(0.4)

−0.18(0.2)

−0.14(0.3)

−0.11(0.4)

−0.096(0.5)

−0.041(0.8)

−0.19(0.2)

−0.1(0.5)

−0.087(0.5)

0.96(2e−29)

−0.074(0.6)

−0.057(0.7)

−0.17(0.2)

−0.014(0.9)

−0.092(0.5)

−0.12(0.4)

−0.13(0.4)

−0.11(0.4)

−0.063(0.7)

−0.039(0.8)

−0.14(0.3)

−0.073(0.6)

−0.077(0.6)

0.77(4e−11)

−0.063(0.7)

−0.043(0.8)

−0.14(0.3)

−0.01(0.9)

−0.045(0.8)

−0.051(0.7)

−0.038(0.8)

0.6(5e−06)

−0.025(0.9)

−0.026(0.9)

−0.071(0.6)

−0.033(0.8)

−0.013(0.9)

−0.073(0.6)

−0.035(0.8)

−0.038(0.8)

−0.064(0.7)

0.45(0.001)

−0.19(0.2)

−0.27(0.06)

−0.092(0.5)

0.45(0.001)

−0.12(0.4)

−0.036(0.8)

0.32(0.03)

−0.2(0.2)

0.13(0.4)

−0.093(0.5)

−0.038(0.8)

−0.026(0.9)

−0.21(0.1)

−0.015(0.9)

−0.046(0.8)

−0.05(0.7)

−0.023(0.9)

0.6(4e−06)

−0.023(0.9)

−0.026(0.9)

−0.075(0.6)

−0.04(0.8)

−0.002(1)

−0.08(0.6)

−0.033(0.8)

−0.035(0.8)

−0.069(0.6)

0.3(0.03)

−0.11(0.4)

−0.086(0.6)

−0.0021(1)

0.81(1e−12)

−0.045(0.8)

−0.058(0.7)

−0.11(0.4)

−0.091(0.5)

0.13(0.4)

−0.18(0.2)

−0.056(0.7)

−0.083(0.6)

−0.12(0.4)

−0.019(0.9)

−0.12(0.4)

−0.093(0.5)

0.43(0.002)

0.69(3e−08)

−0.054(0.7)

−0.081(0.6)

−0.31(0.03)

−0.049(0.7)

0.45(0.001)

−0.18(0.2)

−0.076(0.6)

−0.14(0.3)

−0.17(0.2)

0.55(4e−05)

−0.065(0.7)

−0.11(0.5)

−0.085(0.6)

−0.07(0.6)

−0.043(0.8)

−0.0025(1)

−0.13(0.4)

−0.042(0.8)

−0.048(0.7)

−0.099(0.5)

−0.036(0.8)

−0.067(0.6)

0.4(0.004)

0.38(0.007)

−0.086(0.6)

−0.11(0.5)

−0.1(0.5)

−0.04(0.8)

−0.056(0.7)

−0.043(0.8)

0.42(0.002)

−0.075(0.6)

−0.055(0.7)

−0.1(0.5)

−0.069(0.6)

−0.056(0.7)

−0.11(0.5)

−0.048(0.7)

−0.11(0.4)

−0.18(0.2)

−0.19(0.2)

−0.2(0.2)

−0.095(0.5)

0.0041(1)

0.83(6e−14)

−0.12(0.4)

−0.15(0.3)

−0.28(0.05)

−0.049(0.7)

0.19(0.2)

−0.053(0.7)

−0.29(0.04)

0.074(0.6)

0.17(0.2)

−0.1(0.5)

−0.34(0.01)

0.072(0.6)

0.044(0.8)

0.61(2e−06)

0.0069(1)

−0.24(0.1)

−0.58(9e−06)

0.061(0.7)

0.2(0.2)

0.089(0.5)

Figure 4. A heat map of module eigengene and tissue correlations. Boxes contain Pearson correlation coefficients and their associated p−values. A strong positive correlation (red) indicates that the ME has higher expression in the given tissue relative to all other tissues. A strong negative correlation (blue) indicates low expression in the given tissue relative to all other tissues. Tissues were classified into groups as described in Table I.

−1

−0.5

0

0.5

1

All_Le

af_S

tage

s

Vege

tativ

e_Le

aves

Repro

ducti

ve_L

eave

s

MEZm_mod13

MEZm_mod18

MEZm_mod23

MEZm_mod02

MEZm_mod19

MEZm_mod03

MEZm_mod01

MEZm_mod04

MEZm_mod21

MEZm_mod22

MEZm_mod07

MEZm_mod16

MEZm_mod10

MEZm_mod11

MEZm_mod20

MEZm_mod24

MEZm_mod05

MEZm_mod17

MEZm_mod08

MEZm_mod09

MEZm_mod12

MEZm_mod15

MEZm_mod06

MEZm_mod14

−0.16(0.3)

−0.09(0.5)

−0.12(0.4)

−0.29(0.04)

−0.44(0.001)

0.08(0.6)

−0.42(0.003)

−0.65(4e−07)

0.0097(0.9)

0.021(0.9)

−0.047(0.7)

0.061(0.7)

−0.42(0.003)

−0.65(4e−07)

−0.026(0.9)

−0.84(5e−14)

−0.37(0.009)

−0.81(7e−13)

−0.59(5e−06)

−0.66(2e−07)

−0.23(0.1)

−0.45(0.001)

−0.24(0.09)

−0.41(0.003)

−0.07(0.6)

−0.037(0.8)

−0.059(0.7)

−0.079(0.6)

−0.04(0.8)

−0.057(0.7)

0.44(0.001)

0.2(0.2)

0.41(0.003)

0.14(0.3)

0.048(0.7)

0.14(0.3)

−0.2(0.2)

0.036(0.8)

−0.34(0.02)

−0.18(0.2)

−0.072(0.6)

−0.17(0.2)

−0.13(0.4)

−0.08(0.6)

−0.077(0.6)

−0.064(0.7)

−0.032(0.8)

−0.048(0.7)

0.34(0.02)

0.52(1e−04)

−0.0067(1)

−0.068(0.6)

−0.02(0.9)

−0.066(0.7)

−0.098(0.5)

0.0078(1)

−0.12(0.4)

−0.29(0.04)

−0.17(0.2)

−0.21(0.2)

−0.13(0.4)

−0.062(0.7)

−0.096(0.5)

0.46(9e−04)

0.75(4e−10)

0.0094(0.9)

0.87(5e−16)

0.54(5e−05)

0.66(2e−07)

0.62(1e−06)

0.39(0.005)

0.44(0.002)

Figure 5. Module eigengene correlations with leaves sampled before and after anthesis and silking. All_Leaf_Stages represents all nine leaf samples. Vegetative leaves are the V1, V2, and V5 leaves. Reproductive leaves are samples from the second leaf above the top ear (R1 and 10, 17, 24, and 31 DAP). See Table I for definitions of developmental stages.