Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic...

14
LARGE-SCALE BIOLOGY ARTICLE Extensive cis-Regulatory Variation Robust to Environmental Perturbation in Arabidopsis W Francisco A. Cubillos, a,b,c Oliver Stegle, d,e Cécile Grondin, a,b Matthieu Canut, a,b Sébastien Tisné, a,b Isabelle Gy, a,b and Olivier Loudet a,b,1 a INRA, Institut Jean-Pierre Bourgin, UMR 1318, ERL CNRS 3559, Saclay Plant Sciences, RD10, F-78026 Versailles, France b AgroParisTech, Institut Jean-Pierre Bourgin, UMR 1318, ERL CNRS 3559, Saclay Plant Sciences, RD10, F-78026 Versailles, France c Departamento de Ciencia y Tecnología de los Alimentos, Universidad de Santiago de Chile, Santiago, Chile d Max Planck Institute for Developmental Biology and Max Planck Institute for Intelligent Systems, 72076 Tuebingen, Germany e European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom ORCID ID: 0000-0003-3717-0137 (O.L.) cis- and trans-acting factors affect gene expression and responses to environmental conditions. However, for most plant systems, we lack a comprehensive map of these factors and their interaction with environmental variation. Here, we examined allele-specic expression (ASE) in an F1 hybrid to study how alleles from two Arabidopsis thaliana accessions affect gene expression. To investigate the effect of the environment, we used drought stress and developed a variance component model to estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and their interactions. We quantied ASE for 11,003 genes, identifying 3318 genes with consistent ASE in control and stress conditions, demonstrating that cis-acting genetic effects are essentially robust to changes in the environment. Moreover, we found 1618 genes with genotype x environment (GxE) interactions, mostly cis x E interactions with magnitude changes in ASE. We found fewer trans x E interactions, but these effects were relatively less robust across conditions, showing more changes in the direction of the effect between environments; this conrms that trans-regulation plays an important role in the response to environmental conditions. Our data provide a detailed map of cis- and trans-regulation and GxE interactions in A. thaliana, laying the ground for mechanistic investigations and studies in other plants and environments. INTRODUCTION Quantitative traits vary as a function of the genetic makeup of the individual and environmental factors. Large-scale linkage mapping and genome-wide association studies have uncovered an abun- dance of associations between specic genetic loci and pheno- types, explaining substantial proportions of the variability in a trait (Alonso-Blanco et al., 2009; Trontin et al., 2011). Ultimately, the phenotypic response to such quantitative trait loci is mediated at the molecular level, motivating efforts to investigate the genetic regu- lation of the transcriptional state and its response to the environ- ment. To this end, molecular proling studies have shown that individual gene expression levels are subject to genetic control and implicated this genetic variation in effects on phenotype (Baxter et al., 2008; Studer et al., 2011; Cubillos et al., 2012a; Roy et al., 2013). For instance, cis-regulatory allelic variation at HIGH-AFFINITY K+ TRANSPORTER1 has been reported to underlie salinity toler- ance, suggesting local adaptation to coastal and/or saline soils in Europe (Baxter et al., 2010). Genome-wide mapping in recombi- nant populations have identied expression quantitative trait loci (eQTLs), contributing to our understanding of orchestrated tran- scriptional modules and their effect on complex traits (Cubillos et al., 2012a). Data integration and network reconstruction has proven fruitful to infer the biological consequences of transcrip- tional variation (Jiménez-Gómez et al., 2010; Kerwin et al., 2011; Parts et al., 2011; Zhu et al., 2012). So far, most eQTL studies have reported that polymorphisms near the encoded transcript explain more of the variance in expression differences (local-eQTL, i.e., potentially cis-acting eQTLs) than trans factors located elsewhere in the genome and detected as a distant-eQTL (West et al., 2007; Cubillos et al., 2012b). However, the inherent differ- ences in statistical power to detect cis-acting versus trans-acting regulators preclude an objective assessment of their relative contributions to expression variability. First, each gene has many more genome-wide variants to be considered when test- ing for trans effects; moreover, for any individual gene, several trans factors likely act together (including indirectly), which renders an explicit identication of all causal variants difcult. Addressing these challenges is important, as accurate quan- tication of the cis and trans components of gene expression traits can yield new insights into the genetic architecture of molecular traits and their interactions in networks (Kliebenstein, 2009). 1 Address correspondence to [email protected]. The author responsible for distribution of materials integral to the ndings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Olivier Loudet (olivier. [email protected]). W Online version contains Web-only data. www.plantcell.org/cgi/doi/10.1105/tpc.114.130310 The Plant Cell, Vol. 26: 4298–4310, November 2014, www.plantcell.org ã 2014 American Society of Plant Biologists. All rights reserved.

Transcript of Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic...

Page 1: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

LARGE-SCALE BIOLOGY ARTICLE

Extensive cis-Regulatory Variation Robust to EnvironmentalPerturbation in ArabidopsisW

Francisco A. Cubillos,a,b,c Oliver Stegle,d,e Cécile Grondin,a,b Matthieu Canut,a,b Sébastien Tisné,a,b Isabelle Gy,a,b

and Olivier Loudeta,b,1

a INRA, Institut Jean-Pierre Bourgin, UMR 1318, ERL CNRS 3559, Saclay Plant Sciences, RD10, F-78026 Versailles, Franceb AgroParisTech, Institut Jean-Pierre Bourgin, UMR 1318, ERL CNRS 3559, Saclay Plant Sciences, RD10, F-78026 Versailles, FrancecDepartamento de Ciencia y Tecnología de los Alimentos, Universidad de Santiago de Chile, Santiago, ChiledMax Planck Institute for Developmental Biology and Max Planck Institute for Intelligent Systems, 72076 Tuebingen, Germanye European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom

ORCID ID: 0000-0003-3717-0137 (O.L.)

cis- and trans-acting factors affect gene expression and responses to environmental conditions. However, for most plantsystems, we lack a comprehensive map of these factors and their interaction with environmental variation. Here, we examinedallele-specific expression (ASE) in an F1 hybrid to study how alleles from two Arabidopsis thaliana accessions affect geneexpression. To investigate the effect of the environment, we used drought stress and developed a variance component model toestimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and theirinteractions. We quantified ASE for 11,003 genes, identifying 3318 genes with consistent ASE in control and stress conditions,demonstrating that cis-acting genetic effects are essentially robust to changes in the environment. Moreover, we found 1618genes with genotype x environment (GxE) interactions, mostly cis x E interactions with magnitude changes in ASE. We foundfewer trans x E interactions, but these effects were relatively less robust across conditions, showing more changes in thedirection of the effect between environments; this confirms that trans-regulation plays an important role in the response toenvironmental conditions. Our data provide a detailed map of cis- and trans-regulation and GxE interactions in A. thaliana, layingthe ground for mechanistic investigations and studies in other plants and environments.

INTRODUCTION

Quantitative traits vary as a function of the genetic makeup of theindividual and environmental factors. Large-scale linkage mappingand genome-wide association studies have uncovered an abun-dance of associations between specific genetic loci and pheno-types, explaining substantial proportions of the variability in a trait(Alonso-Blanco et al., 2009; Trontin et al., 2011). Ultimately, thephenotypic response to such quantitative trait loci is mediated at themolecular level, motivating efforts to investigate the genetic regu-lation of the transcriptional state and its response to the environ-ment. To this end, molecular profiling studies have shown thatindividual gene expression levels are subject to genetic control andimplicated this genetic variation in effects on phenotype (Baxteret al., 2008; Studer et al., 2011; Cubillos et al., 2012a; Roy et al.,2013). For instance, cis-regulatory allelic variation atHIGH-AFFINITYK+ TRANSPORTER1 has been reported to underlie salinity toler-ance, suggesting local adaptation to coastal and/or saline soils in

Europe (Baxter et al., 2010). Genome-wide mapping in recombi-nant populations have identified expression quantitative trait loci(eQTLs), contributing to our understanding of orchestrated tran-scriptional modules and their effect on complex traits (Cubilloset al., 2012a). Data integration and network reconstruction hasproven fruitful to infer the biological consequences of transcrip-tional variation (Jiménez-Gómez et al., 2010; Kerwin et al., 2011;Parts et al., 2011; Zhu et al., 2012). So far, most eQTL studieshave reported that polymorphisms near the encoded transcriptexplain more of the variance in expression differences (local-eQTL,i.e., potentially cis-acting eQTLs) than trans factors locatedelsewhere in the genome and detected as a distant-eQTL (Westet al., 2007; Cubillos et al., 2012b). However, the inherent differ-ences in statistical power to detect cis-acting versus trans-actingregulators preclude an objective assessment of their relativecontributions to expression variability. First, each gene hasmany more genome-wide variants to be considered when test-ing for trans effects; moreover, for any individual gene, severaltrans factors likely act together (including indirectly), whichrenders an explicit identification of all causal variants difficult.Addressing these challenges is important, as accurate quan-tification of the cis and trans components of gene expressiontraits can yield new insights into the genetic architecture ofmolecular traits and their interactions in networks (Kliebenstein,

2009).

1 Address correspondence to [email protected] author responsible for distribution of materials integral to the findingspresented in this article in accordance with the policy described in theInstructions for Authors (www.plantcell.org) is: Olivier Loudet ([email protected]).W Online version contains Web-only data.www.plantcell.org/cgi/doi/10.1105/tpc.114.130310

The Plant Cell, Vol. 26: 4298–4310, November 2014, www.plantcell.org ã 2014 American Society of Plant Biologists. All rights reserved.

Page 2: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

The extent to which cis- and trans-acting factors respond toenvironmental perturbations in plants is still unclear. Genes cancoordinately react to environmental stimuli altering the cis andtrans response (Kliebenstein, 2009). Evidence in human cohortssuggests that cis-eQTLs are predominantly robust with respectto cellular environments such as tissue type, where genetic ef-fects on transcription were rarely found to be specific to anenvironment (Grundberg et al., 2011). Similarly, local putative ciseQTLs in mice and Arabidopsis thaliana recombinant inbredlines were largely conserved between different treatments(Ziebarth et al., 2012; Lowry et al., 2013). However, owing to thestatistical limitations mentioned above, most studies have fo-cused on putative cis effects and, hence, a balanced assess-ment of how regulatory variants, both in cis and trans, respondto environmental changes is lacking. In a single environmentalbackground, the utilization of F1 hybrids has become an es-tablished tool to estimate allele-specific expression (ASE), whichallows direct quantification of cis-regulatory effects. First, thisapproach provides unambiguous proof that proximal cis-actingpolymorphisms cause expression differences. The principlefor establishing the extent of cis variation in F1 hybrids isbased on the fact that trans-acting variants will affect bothparental alleles equally, hence cancelling their respective con-tribution, while cis effects will remain allele-specific and standout. In the F1 hybrids, the trans component of genetic effectscan be assessed by contrasting the F1 hybrid to pools ofparental RNA with balanced contributions from both parents.In expression estimates from these pooled samples, the transeffect is not cancelled and therefore can be accounted for(as one factor summing all individual trans effects and po-tentially some more complex features), allowing the dissec-tion of cis- and trans-regulatory variation for every transcript(Wittkopp et al., 2004; McManus et al., 2010). The deep as-sessment of ASE in different species has revealed wide di-vergence within regulatory regions between and within speciesranging from fruitflies (Drosophila melanogaster; McManuset al., 2010), Arabidopsis (Zhang and Borevitz, 2009), yeast(Emerson et al., 2010; Skelly et al., 2011; Martin et al., 2012),and mouse (Goncalves et al., 2012) to maize (Zea mays;Paschold et al., 2012). Consequently, integration of these datainto a single model allows estimation of and statistically testingfor the contribution of cis and trans factors (Goncalves et al.,2012).

Here, to determine the extent to which genotype and envi-ronmental factors determine allelic expression differences in A.thaliana, we extended the principles of ASE analysis to studygenotype-environment interactions. We developed a sensitiveapproach to estimate the contribution of cis- and trans-regulation altering transcript abundance between divergent ac-cessions Columbia-0 (Col-0) and Cape Verde Island (Cvi-0),grown under contrasting environments. RNA-seq-based ex-pression estimates were used to quantify genome-wide patternsof ASE in F1 hybrids and their parental plants. This approachallowed for high-resolution eQTL mapping despite a smallsample, allowing us to detect more than 4700 genes exhibitingASE. Moreover, the estimation of transcript abundance in twoconditions enabled the assessment of additive environmentaleffects as well as the extent to which cis and trans genetic

effects respond to this external perturbation (genotype x envi-ronment [GxE]). Differences between conditions could largelybe attributed to ASE magnitude changes (cis x E) and trans-interactions due to changes in the direction of the effect be-tween conditions. Overall, these results reveal the prevalence ofASE and natural variation in expression levels between twodivergent accessions, providing direct evidence of the tran-scriptional effects of allelic changes between environmentsand the robust response of cis-acting genetic regulators acrossconditions.

RESULTS

Cvi-0 Resequencing

Accurate estimation of allele-specific expression depends ona detailed map of segregating alleles between the parental lines.To this end, we resequenced the Col-0 and Cvi-0 genomes at;233 coverage to detect single nucleotide polymorphisms(SNPs) and polymorphic regions that could allow for the quan-tification of ASE. We surveyed the variation in SNPs, genecontent, and insertions and deletions (indels). Genome-wide, weidentified 726,112 high-quality SNPs (see Methods) and 116,129nucleotide indels in the Cvi-0 nuclear genome with respect tothe Col-0 reference (Supplemental Data Set 1), highlighting thegenetic distance of these accessions in comparison to previouslarge-scale surveys of natural A. thaliana accessions (Cao et al.,2011). Many of these polymorphisms were found in intergenicregions, which is in line with previous variation studies (Ganet al., 2011; Schneeberger et al., 2011). Within coding regions,we found more exonic SNPs than intronic SNPs (250,040 versus207,324), which stands in striking contrast with indels, whichwere substantially more prevalent in introns (5-fold enrichment;Figure 1). Overall, SNP variants were found in exonic regions of26,698 protein-coding genes, covering 79.4% of the annotatedelements in the TAIR10 A. thaliana genome. In order to validateour SNP calling in Cvi-0, we performed Sanger sequencing of 29genes previously identified as local-eQTLs for the same cross

Figure 1. Distribution of Polymorphisms from the Assembly of the Cvi-0Genome.

Classification and percentage of SNPs and indels based on the Col-0reference genome.

Robust cis-Regulatory Variation under Stress 4299

Page 3: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

(Cubillos et al., 2012b). This orthogonal technology identifieda total of 155 SNPs, which were in complete agreement withthe genotype calling based on the Illumina short read data (nofalse-positive SNP call), suggesting sufficient accuracy of ourSNP-calling pipeline to reliably estimate ASE. In addition, weresequenced the Col-0 accession used in this study to a similarcoverage as Cvi-0. Interestingly, our Col-0 line differed from thereference TAIR10 sequence at 711 unambiguously identifiedSNPs (Supplemental Data Set 1) with high confidence. From thisset, 33.3% represent SNPs within coding exon regions; how-ever, only 25% of them are nonsynonymous changes (affecting59 genes).

Quantification of ASE in Parental Pools and F1 Hybrids

To study the effect of the environment, as well as cis- and trans-acting factors, on gene expression, we examined the extent ofexpression changes across environments in F1 hybrids andparental pools between Col-0 and Cvi-0 accessions in A. thaliana.Both the hybrids and the parental strains were grown in twoenvironmental conditions: well-watered (four F1 individuals,three Col-0 x Cvi-0 and one Cvi-0 x Col-0) and water-deficit(three F1 individuals, Col-0 x Cvi-0) conditions (hereafter referredto as control and drought stress conditions, respectively; seeMethods and Figure 2). For every F1 individual, RNA-seq gen-erated Illumina single-end reads of 100 bp from cDNA libraries.Likewise, to account for trans variation, we considered fourparental pools between accessions (two pools for each growthcondition), which were sequenced using equal amounts of RNAfrom each parent (Figure 2) (Wittkopp et al., 2004; McManuset al., 2010). Incorporating the parental pools enabled us to (1)discriminate expression divergence due to segregating purelycis-acting variants (observed in F1 hybrids), trans-acting factors(summed altogether), or a combination of both genetic effects(Figure 3A) (Wittkopp et al., 2004) and (2) estimate the responseof these factors to environmental perturbations. Here, trans ef-fects include distal genetic effects, interactions between cis-and trans-acting variants, dominance effects, and higher ordergenetic effects that cannot be explained by the cis haplotype.

RNA-seq reads for each sample were aligned to the Col-0reference genome, using a variant-aware mapping approach toavoid biases in the mapping procedure. Reads that spannedsegregating variants between Col-0 and Cvi-0 were annotatedas originating from one of the two haplotypes. The geneticdistance between Cvi-0 and Col-0 permitted us to assign a largefraction of RNA-seq reads to either genome (on average 13%of the reads), and we used these counts to quantify ASE(Supplemental Figure 1). Haplotype-specific expression esti-mates were obtained for each gene, counting the number ofallele-specific reads stemming from either genomic background.To maximize the sensitivity to detect ASE events, reads fromreplicates were pooled into four groups, the parental pool andF1 hybrids, each under control and drought stress conditions(Figure 2). Genes with fewer than five allele-specific reads foreach parental and hybrid group were removed from the analysis,resulting in 11,003 genes (hereafter referred to as ASEq genes)for which we could quantify ASE, and representing 63% of thegenes for which reads were found.

Initially, to estimate the number of genes exhibiting significantallele-specific imbalance in each condition, a basic binomialexact test with a genome-wide false discovery rate (FDR) of 5%was employed in hybrids, using all reads per gene that could beunequivocally assigned to an allele. To ensure that these areregulatory genetic effects, we first tested for genomic imprinting,which refers to a mechanism in plants and animals, crucial fornormal development, where inherited alleles are differentiallyexpressed depending on the sex of the parents (Feil and Berger,2007). In A. thaliana, imprinting effects have been observed inendosperm during embryo development (Scott et al., 1998;Gehring et al., 2011). A test for imprinting was performed bypairwise comparison of ASE levels between the three F1 Col xCvi individuals and the single reciprocal F1 Cvi x Col plant grownunder control conditions. As expected for individuals at lategrowth stages, we did not detect genes with consistent oppo-site ASE levels between reciprocal crosses (q < 0.05), rejectingthe hypothesis of imprinting as a cause for the observed allelicimbalance.Genome wide, we detected 4706 and 4424 genes exhibiting

allele-specific imbalance in F1 hybrids for control and droughtstress at a 5% FDR (Figure 3B), representing 42.7 and 40.2%,respectively, of all the ASEq (Figure 3C; Supplemental DataSet 2). Across genes, the ASE ratios were highly correlatedbetween environments (Spearman correlation test, rho = 0.96,P < 0.00001), with 3318 genes exhibiting significant allelic im-balance in both environments (q < 0.05, binomial exact test;green dots in Figure 3B), suggesting widespread cis-regulatoryvariation sustained across conditions. Moreover, 47.4% (46.8%)of the significant ASE genes exhibited greater expression levelsfrom the Cvi allele for the control (stress) condition, in agreementwith an expected symmetrical divergence of cis-regulatorypolymorphisms. We considered possible reference biases(Supplemental Figure 2), finding that our alignment approachthat accounts for Cvi variants (see Methods) was close to un-biased. This suggests that our results are unlikely to be severelyaffected by reference bias. To evaluate the accuracy of the RNA-seq-based detection of ASE, we validated our results in thesame hybrid samples by pyrosequencing (Wittkopp et al., 2004),estimating ASE in 34 genes previously mapped as local-eQTLs(Cubillos et al., 2012b) (Supplemental Figure 3). In both con-ditions, the RNA-seq-based estimates and the quantification bypyrosequencing were highly correlated (P < 0.0001, Spearmancorrelation test: R2 = 0.81 and R2 = 0.87 for control and droughtstress, respectively) (Supplemental Figure 3). We further com-pared the consistency between our estimates for genes withsignificant ASE and eQTLs identified in the Cvi-0 x Col-0 re-combinant inbred lines (Cubillos et al., 2012b). Out of the 2010genes for which local-eQTLs were previously detected (Figure 3C),we were able to evaluate ASE (under control conditions) for1475, and 954 (64.6%) showed evidence of variation attributableto cis factors with a consistent directional effect in 85.6% of thecases. Moreover, we found a significant correlation betweenboth studies in the extent of the differential allelic regulation forthose genes sharing directionality (P < 2e-16, R = 0.56 Pearsoncorrelation coefficient; Supplemental Figure 4), indicating theoverall consistency of these regulations across independentgenetic designs and experiments. Nevertheless, our current ASE

4300 The Plant Cell

Page 4: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

Figure 2. Overview of the ASE Approach.

Robust cis-Regulatory Variation under Stress 4301

Page 5: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

strategy detected over four thousand genes with a significantcis-contribution not previously mapped in the RILs, demon-strating its powerful complementation to previous eQTLs map-ping studies.

Contribution of cis- and trans-Regulatory Variants over ASE

Next, to obtain quantitative estimates of the effect of individualgenetic factors on expression differences between accessions,we employed a variance component model, initially inde-pendently of the environment, to estimate the variance in con-tributions of cis- and trans-regulatory polymorphisms tovariation in expression of each ASEq gene. Conceptually, thisapproach is similar to an ANOVA model, estimating the pro-portion of variance that is attributable to cis and trans effects.However, importantly, this approach models jointly the quanti-fication of RNA-seq reads specific for each allele, thereby as-certaining the effect of cis-regulatory variants (see Methods). Wearbitrarily fixed a minimal explained variance of 10% to classifygenes as either cis- or trans-regulated. Under this strict criterion,analysis of the 11,003 ASEq genes showed that variation in al-lelic expression is predominantly cis-regulated, rather thantrans-regulated (P < 2e-16, t test; Supplemental Data Set 2). Wedetected 5137 genes predominantly driven by cis factors, wherejust the cis component explained more than 10% of the trait’svariation (this was consistent in 83% of the cases with ASElevels as tested in the previous section, under either control and/or stress conditions, 5% FDR, binomial test). In addition, wefound 1155 genes with a predominant trans variance effect,which is four times lower than the analogous cis-contribution.Among the genes with pronounced ASE (>10% variance), wefound a higher average proportion of variance explained by ciseffects compared with trans (45.3% versus 21.5%, P < 0.0001Kolmogorov-Smirnov test). For 338 genes, the model identifiedboth cis and trans components explaining more than 10% of thegene expression variance, representing 3% of all genes withquantifiable ASE (Figure 4A). In line with the overall preferencesfor cis effects, also in this set of genes sharing cis- and trans-regulations, the average variance explained in cis was signifi-cantly greater than the variance explained by trans factors(34.2% versus 18.8%, P < 0.0001, Kolmogorov-Smirnov test).To understand the frequency at which cis- and trans-regulatoryvariants converge in the direction of their effect, we analyzed theset of 338 genes exhibiting both regulatory mechanisms. Wecategorized the direction of the cis and trans effect as either“convergent” (same direction) or “compensatory” (opposite di-rections). In 49.7% (51.4%) of the cases under control (stress)conditions, allelic effects were convergent between cis and trans

(Figure 4B), indicating a nearly uniform occurrence of multipleregulatory variants acting in different directions.We next examined the set of 1475 ASEq genes for which

local-eQTLs were identified in our previous study of the samecross, but in a RIL design (Cubillos et al., 2012b). ASE analysisrevealed that 994 of these genes indeed showed evidence ofvariation attributable to cis factors explaining over 10% of theexpression trait. This represents 67.4% consistency in cis ef-fects between these experiments, in agreement with similarfindings in mice studies (Lagarrigue et al., 2013). By contrast, outof the 2294 distant-eQTLs previously detected, the overlap withASEq genes associated with variable trans-acting regulation inthis study was only 6.1%, suggesting a less robust expressionpattern for trans factors between these experiments performedat different times in distinct conditions. This could also be af-fected by power issues or inherent limitations resulting from thedifference in genetic design to resolve specific cases (e.g.,compensating trans-eQTLs controlling the expression of onegene).

Environmental Effects on Expression Variation

To assess the extent to which environment affects gene ex-pression levels, we estimated the number of genes in F1 hybridsdifferentially expressed between drought and control conditions,independently of the parental origin of each allele, using DESeq(Anders and Huber, 2010). We found 678 genes exhibiting sig-nificant differences in expression between the two environ-ments (FDR 5%), with fold changes ranging from 0.4 to 29(Supplemental Figure 5A and Supplemental Data Set 2). Thislimited transcriptional response reflects our intention to performphysiologically relevant mild water deficit treatments resulting inquantitative decreases in growth rates (Tisné et al., 2013). Usingquantitative RT-PCR, we confirmed differential transcript ac-cumulation for six genes, finding good levels of agreement(Supplemental Figure 5B). The direction of changes for the 678stress-responsive genes was approximately balanced, with 342upregulated and 336 downregulated genes under drought stresscompared with the control condition. The analogous analysisperformed in the parental pools yielded 207 differentially ex-pressed genes (Supplemental Figure 5C), out of which only 112were also detected in F1 hybrids. Differences between hybridsand parental pools indicate that the genotype configuration canaffect the response of gene expression to the environment.Moreover, analysis of the F1 gene set responding to the envi-ronment by Singular Enrichment Analysis showed a significantenrichment for Gene Ontology biological processes respondingto stimulus (e.g., “response to chemical stimulus” and “response

Figure 2. (continued).

A diagram representing the steps followed for plant material generation, RNA-seq, data analysis, and ASE estimations is shown. Vertical arrows indicatethe number of Col-0, Cvi-0, F1 Col-0 x Cvi-0, and F1 Cvi-0 x Col-0 individuals grown on the Phenoscope platform and sequenced. Homogenous sets ofindividuals were selected based on rosette growth analysis (PRA: projected rosette area) to discard outliers (red growth profiles), and RNA sampleswere obtained for sequencing on an Illumina HiSeq2000. Reads were aligned to both parental genomes, and haplotype counts to either Cvi-0 or Col-0were assigned based on SNP data. Finally, ASE was estimated as the ratio between the number of Col and Cvi reads in F1 samples as well as inbalanced pools of parental samples and variance component model generated.

4302 The Plant Cell

Page 6: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

to jasmonic acid stimulus”) and stress (e.g., “response to oxidativestress”) (FDR < 0.05; Supplemental Table 1).

Comparisons between DESeq and ASE results identifieda shared set of 302 genes exhibiting ASE under both environ-ments and 138 exhibiting ASE only in a single condition, sug-gesting a specific response in the latter group. For example,VITAMIN C DEFECTIVE5 (VTC5), which encodes a GDP-L-galactose phosphorylase and has a role in response to waterdeprivation (Heyndrickx and Vandepoele, 2012), showed sig-nificant overexpression of the Col allele and a lower increase ofthe Cvi allele in response to drought stress, as opposed tocontrol conditions where VTC5 alleles showed no differences(Figure 5A).

Environmental differences can affect gene expression in-dependently of the genotype or alter the specific response of anallelic variant. To determine the extent to which our differenttreatments modulates the effect of cis- and trans-regulatoryfactors to influence gene expression, we extended the variance

component analysis to include direct environmental effects andgenotype x environment (GxE) interactions involving cis- (Gc x E)or trans-regulatory factors (Gt x E). Genome wide, this analysisidentified 2348 genes that were subject to significant en-vironmental regulation, out of which 1618 genes had significantvariation explained by GxE interactions (>10% variance in eithercis or trans). Out of the 1618 genes with a significant GxE effect,919 had a significant contribution from Gc x E, 524 from Gt x Eand 175 had both types of cis and trans interactions Gct x E(Figure 5B; Supplemental Data Set 2). Among genes with a sig-nificant GxE effect, the median variance explained by cis factors(Gc x E) was larger than for trans factors (Gt x E) (24.7% versus20.1%, respectively, P < 0.0005, t test), which is consistent withthe relative importance of direct cis and trans effects. However,notably, genes with a significant Gt x E component were fre-quent (43.2% of the genes under GxE) (Figure 5B), which con-trasts with direct genetic effects, which were dominated bycis-regulation (Figure 4A).

Figure 3. ASE under Control and Stress Conditions.

(A) Diagram representing the effect of cis- and trans-regulatory variants in hybrids and parental accessions, respectively. In a diploid nucleus, a cisvariant solely influences the expression of the nearby allele of Gene X (detectable in F1 hybrid), while Gene Y allelic variants (trans factor) will influencethe expression of both alleles (detectable among parental samples).(B) Relative log2 Col/Cvi transcript accumulation ratio in F1 hybrids grown under control (x axis) and drought stress (y axis) conditions. Genes exhibitingsignificant ASE levels under both conditions (green), solely under control (blue), solely under stress (red), or none (black) are depicted.(C) Genome-wide levels of ASE under control and stress conditions in comparison to the level of local-eQTLs (potentially cis-eQTLs) previously mappedin a set of F6 RILs derived from a Cvi-0 x Col-0 cross (Cubillos et al., 2012b).

Robust cis-Regulatory Variation under Stress 4303

Page 7: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

Detection of the Directional Effect of thecis/trans-Regulation across Environments

To understand how cis- and trans-regulatory elements interact toshape gene expression between drought stress and control con-ditions, we determined whether the change between environmentswas solely a magnitude effect or a directional change in genes underGc x E andGt x E. Overall, for Gc x E genes, we observed a significantmajority of cases displaying magnitude changes rather than directionchanges (Binomial test P < 23 10216) in the effect of the environmenton allele specific expression (Figure 6A). For genes subject to Gt x E,we observed a corresponding tendency, with a significant number ofgenes with magnitude changes across environments (Binomial testP < 2 3 10216), but the percentage of genes exhibiting directionchanges was greater compared with Gc x E genes (34% versus 11%,respectively). These results indicate that, although robust cis- andtrans-regulation occurs across environments, drought stresswould especially affect the direction of the trans-regulation, butshow less of an effect on cis regulation, which tended to haveconsistent directions in 89% of the cases across conditions.

We then compared parental sequence divergence within genebodies (number of SNPs/kb) and regulatory regions (up to 1 kbupstream the transcribed sequence) in the 175 genes with bothcis and trans factors interacting with the environment (Gct x E),including genes with a stronger trans than cis interaction vari-ance (Gt>c x E, trans x environment > cis x environment) andgenes with a stronger cis than trans interaction variance (Gc>t xE, cis x environment > trans x environment). A SNP enrichmentcomparison between Gc>t x E and Gt>c x E genes did not showevidence of a significant difference between groups within genebodies (9.7 versus 7.9, respectively; P < 0.16, t test) or regula-tory regions (10.4 versus 8.6, respectively; P < 0.2, t test; Figure6B). A similar comparison between genes with changes in ex-pression divergence across environments purely in cis (Gc x E >0.1 > Gt x E) and purely in trans (Gt x E > 0.1 > Gc x E) amplifiedthe previous tendency in regulatory regions (9.54 versus 8.71,respectively; P = 0.06, t test) or within gene bodies (8.5 versus7.7, respectively; P = 0.04, t test), with a marginally significantdifference in both cases. Moreover, when we analyzed SNP

Figure 5. GxE Interactions.

(A) Normalized read counts in F1 hybrids for VTC5 under control and stress conditions. Each dot depicts a single read count; the dashed lines highlightthe mean response to treatment.(B) Bar graph representing the fraction of genes significantly affected by either or both types of regulation in interaction with the environment, based ona variance component model.

Figure 4. Regulatory Contributions and Interactions for the Genotype Component.

(A) Bar graph representing the fraction of genes at different variance cutoffs affected by either or both types of regulation depending solely on thegenotype of the allelic variant based on a variance component model.(B) Bar graph representing the fraction of genes affected in both cis and trans (Gct) where cis- and trans-regulatory effects on transcript accumulationhave either convergent or compensatory directions.

4304 The Plant Cell

Page 8: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

density independently of the environment, we found in geneswith an exclusively cis component (Gc > 0.1 > Gt) a significantSNP enrichment within gene bodies (10.21 versus 7.24, re-spectively, P < 0.0001, t test) and regulatory regions (10.25versus 8.78, respectively, P < 0.0001, t test) over those genessolely under trans-regulation. Together, these results indicate anexcess of polymorphisms within regulatory and coding regionsin genes under cis-regulatory variation and, to a lower extent,those cis variants interacting with the environment.

DISCUSSION

Here, we have shown how allele-specific expression profiling intwo divergent A. thaliana accessions and across different envi-ronments allows for estimation of the contribution that cis- andtrans-regulatory variants make to variability in gene expression.Notably, we extended previous experimental designs by ex-plicitly controlling for a stress-related environmental factor tostudy the direct contribution on gene expression and inter-actions with the genotype. Genome-wide estimates of allele-specific transcript abundance were obtained using RNA-seqread counts from individual F1 hybrids and pools of parentsbetween Col-0 and Cvi-0 accessions grown under finely con-trolled environmental conditions.

In order to perform an exhaustive genome-wide ASE analysis,it is important to build on comprehensive and accurate geno-mic information from both parental accessions. To this end,

resequencing using Illumina short read technologies allowed usto obtain a catalog of Cvi-0 genetic variants and identify high-quality unambiguous SNPs between our Col-0 and the TAIR10reference Col-0 accession (Figure 1). Likely, all these SNPsdiffering between Col-0 stocks correspond to spontaneousmutations fixed in our Col-0 line after it was split from the ref-erence Col-0 (Ossowski et al., 2010), rather than representingerrors in the TAIR10 reference sequence. We used the catalog ofCol/Cvi SNPs mapping to 26,698 genes to assign individualRNA-seq reads to either genome. After applying a stringent filterfor genes with low expression and genes where we are not ableto quantify properly the expression level for one of the two al-leles, we were able to quantify ASE for 11,003 genes, whichrepresent ;60% of the genes expressed in 3-week-old rosettesgrown under our conditions (Figure 2). We analyzed the ASE bytwo complementary strategies. First, we applied a well-establishedbinomial test (FDR < 5%) to hybrid data and found that, underboth conditions, over 40% of the interrogated genes showedsignificant evidence of being alternatively regulated due topolymorphisms acting in cis (Figure 3C). Strikingly, a majorityof these genes (3318) showed ASE under both conditions, in-dicating a prevalent cis-effect across these two environments.Second, we used a variance component model to dissect thecontribution of the variation in transcript accumulation from cis(Gc), trans (Gt), and environmental (E) factors. In our model, weassumed that changes in the absolute abundance of a particulartranscript may be due to: (1) genetic factors (the Gc, Gt, or Gct

Figure 6. Effect Change and Diversity at Genes under GxE Regulation.

(A) Distribution of genes under cis x E (Gc x E) or trans x E (Gt x E) interaction, according to the quality of the change in regulation across treatment:either a change in the magnitude of the effect or in the direction of the effect.(B) Bar graph representing the average number of SNPs within regulatory regions (orange bars) and body (purple bars) of genes regulated underdifferent scenarios of Genotype and GxE components. Error bars represent SD. *P < 0.05 and ***P < 0.001; t test.

Robust cis-Regulatory Variation under Stress 4305

Page 9: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

contribution), (2) environmental factors (E), or (3) the interactionbetween genetic and environmental factors at each level (GxE).The results from our model suggest that 46.7% of the genes hadalternative cis-regulatory variants depending on the accessionand independently of the environment (>10% variance), inagreement with our binomial test. This number of genes ex-hibiting ASE corresponded to about three times that previouslyfound in a Col-0 x Vancouver hybrid, in experiments usinga different strategy that probably resulted in lower power (Zhangand Borevitz, 2009). An interspecific cross (Arabidopsis lyrata xA. thaliana) also identified less variation at this scale and withvery different features (He et al., 2012). In line with recent reportsin mouse (Goncalves et al., 2012), our ASE-based estimatesconfirm the reduced influence of trans variation, as has beenestimated from population-level analysis (Pickrell et al., 2010).The moderate number of genes with significant trans variationtogether with the distribution of explained variance acrosstranscripts suggests a quantitatively minor contribution of transfactors toward natural variation in gene expression (Figure 4A).These results are also in agreement with our recent eQTL surveyin Cvi-0 x Col-0 RILs, which showed more local- than distant-eQTLs at a conservative FDR, with milder effect contributions ontranscript abundance from distant factors (Cubillos et al.,2012b). The remarkable difference between the number ofgenes under cis or trans regulation suggests that variation intranscript abundance between accessions seems to be gov-erned mostly by polymorphisms near the gene, as previouslysuggested in a study of a large panel of A. thaliana accessions(Gan et al., 2011). Nevertheless, part of the imbalance betweenthe number of cis- and trans-regulatory variants in this studymight be due to statistical limitations in the detection of complexcis x trans genetic interactions in only two parental pool repli-cates per condition, versus three replicates for F1 hybrids.Furthermore, differences in statistical power to identify cis andtrans regulators could be mainly due to larger effects in cis fac-tors, which are then easier to detect (Kliebenstein, 2009). Indeed,when applying less stringent arbitrary cutoffs to the variancecomponent analysis, a number of genes previously thought to becontrolled solely in cis also appear to be significantly controlled intrans (Figure 4A), which is in line with polygenic and weak transregulation for most genes. Finally, complex compensating factors(that may be more frequent among trans than cis elements) will bedifficult to detect when comparing F1 and parental pools, sincethese will be summed in any case.

Furthermore, we attempted to validate our ASE and transestimations by pyrosequencing using a large set of genes ex-hibiting variable levels of either form of differential regulation.This approach demonstrated the accuracy of our quantitativeestimations with significantly correlated ratios between bothapproaches (Supplemental Figure 3). We successfully validatedthe 27 genes predicted to have a significant cis variance fromour binomial test. Also, 10 of 22 genes expected to be undertrans-acting variation were validated by pyrosequencing. In-consistencies between both techniques for weaker trans effectsmight arise because pools of parents used on each measure-ment were derived from different individuals, while the sameRNA samples (= same F1 individuals) where used in RNA-seqand pyrosequencing validation on hybrids.

Lately, the incidence of genes under both types of regulationin opposite directions has been extensively analyzed (Smith andKruglyak, 2008; Goncalves et al., 2012). In our study, we ob-served that the occurrence of compensatory cis- and trans-regulatory variation is not rare, as nearly half of the interrogatedgenes showed this type of regulation, independently of thecondition (Figure 4B). To add to our understanding of the majorequilibria behind the regulation of transcript accumulation, westudied a challenging environmental contrast: Our controlledmild drought stress treatment mimics a major abiotic limita-tion to growth, which plants face in the wild and have de-veloped diverse strategies to cope with (Bouchabke et al.,2008). We used DESeq to evaluate the response to stress anddetect genes differentially expressed between conditions, in-dependently of the parental origin of each read (SupplementalFigure 5). Comparison of ASE between environments showedthat cis-regulatory effects tend to be particularly stable betweenenvironments (Figure 3C). A recent study in human cell lines andprimary tissue showed that tissue-specific cis-regulatory effectsare rare (Grundberg et al., 2011), which is in line with our resultsbetween environmental conditions. Yet, the variance componentmodel allowed us to identify complex GxE interactions (Figure6A): We observed that Gc x E was still the predominant type ofinteraction, explaining a greater expression variance with re-spect to other types of GxE. Nevertheless, trans-acting factorswere significantly enriched among those involved in GxE (Figure5B) compared with their prevalence in either environment (Figure4A), suggesting that the stress treatment preferentially impactstrans regulation profiles. Overall, the variance component modelallowed us to detect more genes responding to environmentalperturbations compared with the well established DESeq tool,likely due to the power to dissociate the specific allelic responsefor each GxE interaction. A previous study similarly using twogrowth conditions in yeast established that local linkages aregenerally more stable across environments, as opposed todistant linkages, which are more likely to vary depending on thecondition (Smith and Kruglyak, 2008). Moreover, work in yeastshowed that the identification of cis variants could representa workhorse for the identification of directional regulatorychanges and natural phenotypic differences between individuals(Martin et al., 2012). Therefore, our results can serve as a test forfuture approaches aiming to tackle for instance drought stressadaptation differences between Col-0 and Cvi-0 accessions.Interestingly, in our study, the majority of genes with Gc x

E effects showed an ASE magnitude change between con-ditions rather than a change in the direction of the effect. Thus,cis-regulation can have an effect in both conditions and themagnitude of the ASE variation might be influenced by theabundance of the transcript in a specific condition. Overall, ourresults provide quantitative evidence that genes might be ad-ditionally influenced by environment-specific cis-regulatoryelements demonstrating the extent of cis variation acrossconditions. A similar analysis based on the directionality of thetrans effect on Gt x E genes showed a less dramatic trend, with34% of the genes exhibiting a change in the direction of thetrans effect between environments (Figure 6A). Distinct transregulators influencing a gene’s expression depending on theenvironment seems a plausible explanation for some of the

4306 The Plant Cell

Page 10: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

divergent effects detected in our study. The lower levels ofpolymorphisms and greater sequence constraint within regula-tory regions and gene body in Gt x E genes reaffirms that in A.thaliana, trans regulation plays an important part in response todistinct environmental conditions. An acceptable explanation forthe appearance of compensatory effects is the coevolution ofcis-regulatory changes and transcription factors able to bufferthe response of gene expression upon changes in environ-mental conditions. Coevolutionary changes and epistatic inter-actions between cis and trans regulators have been described ina single environment for worms (Barrière et al., 2012) and flies(Landry et al., 2005). Thus, conserved expression profiles couldalso be maintained across conditions due to the coordinatedresponse of coevolved alleles.

We note that our analysis is not free of limitation. The ability todetect ASE is intrinsically linked to the overall expression levelsof individual genes and the availability of heterozygous sites toquantify ASE. In addition to this challenge of lower limits ofdetection, the effect size of cis and trans genetic variants willdetermine the detection power and false discovery rates. Inparticular, in the context of population level association mappingstudies, it has repeatedly been reported that cis effects havelarger effect sizes than trans effects, which may bias the abilityto identify causal loci. We note that the ASE approach does not,in principle, suffer from this limitation as the effect of all cis or alltrans factors jointly is ascertained.

Our RNA-seq results provide broad evidence of the extent ofASE in A. thaliana. The significantly greater incidence of ASEfound in this study could be due to the low heterogeneity be-tween the examined replicate individuals (Tisné et al., 2013) andto the great resolution provided by RNA-seq to quantify allelicratios (McManus et al., 2010). Both factors converge towardmore testing power; in addition, the chosen parental accessionsare particularly genetically and epigenetically distant (Simonet al., 2012; Schmitz et al., 2013). Use of the Phenoscopeplatform has proven to finely control environmental variationsthat could introduce noise and variance between individuals,aiding the resolution in this study (Tisné et al., 2013). Moreover,studying regulatory variation using RNA-seq provides greaterdepth coverage per transcript than previous array-based ap-proaches by increasing the ability to quantify and distinguishbetween allelic variants (McManus et al., 2010; Ozsolak andMilos, 2011). Furthermore, we focused on transcript level vari-ation to delineate the response of cis- and trans-regulatory el-ements across environments, providing evidence of a largepanel of loci showing GxE interactions. Even though the numberof genes exhibiting ASE is significantly greater than previouslyreported, we believe that a larger number of individuals togetherwith tissue-specific sampling across a wider spectrum of envi-ronments could still widely extend the repertoire of genes dis-playing expression differences. The question of whether thepattern herein described is transposable to other plant organismsremains; nevertheless, our findings lay the ground for other similarstudies in more complex plant genomes. Characterizing the mo-lecular mechanisms by which cis- and trans-regulatory elementsaffect allelic expression in natural populations will represent a cru-cial step toward understanding how regulatory variants modulatetranscript accumulation and, ultimately, macroscopic phenotypes.

METHODS

Plant Material and Growth Conditions

Seeds from the INRA-Versailles resource center (http://publiclines.versailles.inra.fr/) were obtained to perform reciprocal crosses betweenArabidopsis thaliana accessions Cvi-0 (166AV) and Col-0 (186AV). Theseaccessions were chosen based on the large number of polymorphismsbetween them and the wide distribution of local-eQTLs (Cubillos et al.,2012b) (http://signal.salk.edu/atg1001/). We stratified F1 seeds and pa-rental accession for 4 d at 4°C and seeds were germinated. Plants weregrown under strictly controlled and monitored environmental conditionsas previously described (Tisné et al., 2013). Briefly, individual plants werecultivated in the Phenoscope automated phenotyping setup for 21d under short days (8 h photoperiod) and well watered (60% soil watercontent) or mild water-deficit (30% soil water content) conditions, con-trolled and maintained twice a day by automatically adjusted watering. Allplants were continuously rotated during the experiment to homogenizeenvironmental conditions and rosette images were recorded every dayand stored in the Phenoscope database for further analyses. For eachgenotype, replicates were collected above the root 21 d after sowing andstored at280°C. Images were analyzed and growth profiles representing therate of expansion of the projected rosette area were studied to identify re-maining outliers and discard them from further analysis andRNA-sequencing.

RNA Extraction, cDNA Preparation, and RNA-Sequencing

Frozen rosette tissue was pulverized in liquid nitrogen with a mortar andtotal RNA was extracted using the RNeasy Plant mini kit (Qiagen kit74904) according to the supplier’s instructions. Parental pools weregenerated by mixing 10 mg of RNA from each of two Col-0 and two Cvi-0individuals. RNA samples were then treated with DNase I to removegenomic DNA traces. For library preparation, beads with oligo(dT) wereused to isolate poly(A) mRNA. cDNA samples were prepared usinghexamer primers and the IlluminamRNA-seq sample prep kit according tothe manufacturer’s instructions (Illumina). Briefly, samples were subjectedto fragmentation and these short fragments were used as templates forcDNA first-strand preparation using random-hexamer primers. Thesecond-cDNA strand was synthesized using 13 buffer (500 mM Tris-HCl,pH 7.8, 50 mM MgCl2, and 10 mM DTT), 25 mM deoxynucleotide tri-phosphate, 2 units of RNaseH, and 50 units of DNA polymerase I. Shortfragments were purified with the QiaQuick PCR extraction kit and re-suspended in EB buffer for end repair, dA-tailing, and barcode ligation.Suitable fragments for PCR amplification were size-selected bygel electrophoresis on 2% agarose gels. cDNA was PCR amplified withIllumina primers for 12 cycles. Parental and F1 hybrids libraries weresubjected to single-end sequencing using Illumina HiSequation 2000following the manufacturer’s instructions, where every genotype replicatewas sequenced in an independent lane to avoid technical biases. TheRNA-seq data from this study are available from the NCBI’s Gene Ex-pression Omnibus under accession number GSE43560.

Cvi-0 and Col-0 DNA Isolation, Library Preparation, Sequencing, andData Analysis

Seeds from Cvi-0 and Col-0 were obtained from the same set of seedsused to generate the F1 hybrids for RNA-seq. Plants were grown ina greenhouse in typical long-day conditions (16-h photoperiod) at 20°C,andwhole plants were collected above the roots 21 d after sowing. DNAwasextracted with the CTAB method, and sequencing libraries were preparedusing the Illumina Genomic DNA Sample Prep protocol. Paired-end librarieswith average insert size of 400 nucleotides and 100-nucleotide read lengthwere generated. Sequencing was performed using Illumina HiSequation2000.

Robust cis-Regulatory Variation under Stress 4307

Page 11: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

DNA-seq reads were mapped to the Col-0 reference genome (TAIR10assembly) obtained from the TAIR website (http://www.Arabidopsis.org/)using Bowtie2 (Langmead and Salzberg, 2012) with default settings andBWA allowing up to four mismatches. SNPs were called using the variantcaller UnifiedGenotyper tool from GATK, which allows to call differ-ences from the reference sequence (DePristo et al., 2011), with options‘-max_alternate_alleles 1-genotype_likelihoods_model BOTH -stand_emit_conf 500.’ Heterozygous sites were filtered out to avoid copynumber variation or nonunique sequences. These data were used togenerate a Cvi consensus sequence.

RNA-seq Alignments and Allele-Specific Quantification ofGene Expression

RNA-seq reads were aligned to the Col-0 reference genome using PAL-Mapper, a variant-aware spliced aligner (Jean et al., 2010). The set of detectedCvi-SNPs and indels was accounted for during the alignment process tominimize read mapping biases between both genomes. Across the 10samples, between 25.5 million and 27.6 million reads could be mapped, withthe fraction of mappable reads varying between 92.8 and 93.8%.

In each sample and for each gene, allele-specific gene expression levelswere estimated from all segregating exonic SNPs. Owing to the extent ofdivergence between Cvi-0 and Col-0, individual RNA-seq reads tended tospan multiple segregating alleles. To avoid double-counting, allele-specificcounts were created on the basis of individual reads summed across thegene, ensuring consistency of the reads originating either from the Cvi-0 orCol-0 haplotype. Reads that could not be consistently assigned to eitherhaplotype were removed (on average <0.0005% of all reads). For the bi-nomial testing, raw allele-specific read counts in each sample were used.For the variance component modeling, read counts from both genomeswere separately normalized, adjusting for differences in library size (usingthe count median as in Anders and Huber, 2010) and treated as separateRNA quantification experiments (Supplemental Figure 2).

ASE Tests

For every F1 hybrid replicate (Col-0 x Cvi-0 and Cvi-0 x Col-0), we initiallysummed all the read counts per allele (either Col or Cvi) per gene andperformed a binomial exact test against the null hypothesis of no allele-specific expression to test for cis-regulatory divergence. Genes witheither no count for any of the alleles or less than five allele-specific readsfor each group were discarded from the analysis. Tests were performed inthe R statistical environment using the binom.test function (http://www.R-project.org). Resulting P values were corrected for multiple testing usingthe q-value package (Storey and Tibshirani, 2003). The obtained q-valuerepresented the minimum FDR at which each individual ASE event re-mains significant.

Variance Component Model

Estimation of variance was performed by jointly fitting a mixed linearmodel to the allele-specific read count estimates of a particular geneacross all samples. To estimate variance parameters, the count in-formation was modeled using a linear mixed model applied to variancestabilized expression levels (Supplemental Text 1). Overdispersion pa-rameters were fit using the approach used in DESeq (Anders and Huber,2010), employing a separate variance function for parental pools, andhybrids in control and stress conditions. The full regression modelcontained covariates that correspond to the environmental state (stress/control), the allele at the quantification site for ASE (cis effect), and anindicator variable whether the sample is a parental pool or an F1 (transeffect). To ascertain GxE effects, the cis and trans covariates were split tomodel differences of the effect in the control and stress condition, re-sulting in five covariates.We fit this model to the allele-specific expression

estimates of each gene, using the read depth information for the Col andCvi allele (20 observations per gene). Briefly, the difference between al-leles (Cvi/Col) is used to estimate the contribution of cis genetic effects,whereas the deviation between the pools and the F1 lines can be used todetermine the trans genetic component of gene expression variation.Further details of the variance component model are provided inSupplemental Text 1.

Pyrosequencing and Real-Time PCR for RNA-seq Validation

Pyrosequencing (PyroMark24; Qiagen) was used to estimate relativeallele-specific expression and validate results obtained by RNA-seq. RNAfrom sequenced samples in hybrids and new replicates of pool of parentswere used to generate cDNA. cDNA was prepared using the Fermentasfirst-strand cDNA synthesis kit with poly-T primers and diluted five timesfor PCR. F1 genomic DNA was used for technical control as previouslyindicated (Landry et al., 2005). PCR of cDNA was performed for 34 geneswith variable levels of ASE. Pyrosequencing reactions were performed aspreviously described (Pineau et al., 2012). Briefly, pyrosequencing re-actions were performed using SNPs representative of the ASE betweenCol/Cvi obtained by RNA-seq to independently assess the relativecontribution of each allele. Pyrosequencing was performed on cDNA fromsix F1 individuals (three from each condition), four pools of parents (twofrom each condition), and on F1 genomic DNA as a control to normalizethe ratios against possible pyrosequencing or PCR biases.

Real-time PCR was performed with 3 mL of the diluted cDNA and 13the MESA FAST qPCR Plus for SYBR Green (Eurogentec) kit in a 25-mLreaction volume. Primers sequences are indicated in Supplemental Table2. Primer amplification efficiency was checked in all cases. The UBC21gene (At5g25760) was used as control and relative expression ratios wereestimated as previously described using the 22DDCt method (Livak andSchmittgen, 2001).

Accession Numbers

Sequence data from this article can be found in the Arabidopsis GenomeInitiative or GenBank/EMBL databases under the following accessionnumbers: VTC5, At5g55120; At2g38870; PR5, At1g75040; HSP70,At3g12580; MOT1, At2g25680; QQS, At3g30720; At5g55450; andUBC21, At5g25760.

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure 1. RPKM Scatterplot for Reads Spanning SNPs.

Supplemental Figure 2. Percentage of Reads Specifically Aligning toCol (Red) and Cvi (Blue) Genomes before and after the NormalizationProcedure.

Supplemental Figure 3. Comparison of Allele-Specific ExpressionRatios Obtained from RNA-seq or Pyrosequencing in DifferentSamples and Conditions.

Supplemental Figure 4. Overlap Comparison for the Additive EffectEstimated from Local-eQTL and Genes under Significant Allele-Specific Expression.

Supplemental Figure 5. Expression Changes across Conditions in F1and Parental Pools Estimated by DESeq.

Supplemental Table 1. GO Term Enrichment for Genes DifferentiallyExpressed between Conditions in F1 Hybrids.

Supplemental Table 2. Sequencing Primers Used in the qPCRValidation.

Supplemental Text 1. Variance Component Modeling.

4308 The Plant Cell

Page 12: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

Supplemental Data Set 1. Polymorphisms Detected in Cvi-0 and Col-0.

Supplemental Data Set 2. ASE, Variance Component, and DESeqResults Summary for 11,003 Genes.

ACKNOWLEDGMENTS

We thank Elodie Gilbault and Liên Bach for technical help with thePhenoscope platform. This work was supported by funding from theEuropean Commission Framework Programme 7, ERC Starting GrantDECODE/ERC-2009-StG-243359 to O.L., and ANR Grant 2Complex/ANR-09-BLAN-0366 to O.L. O.S. was supported by a Marie Curie FP7fellowship.

AUTHOR CONTRIBUTIONS

F.A.C. and O.L. designed the research. F.A.C., C.G., M.C., and I.G.performed research. S.T. contributed new analytic/computational tools.F.A.C. and O.S. analyzed data. F.A.C., O.S., and O.L. wrote the article.

Received July 22, 2014; revised October 22, 2014; accepted November8, 2014; published November 26, 2014.

REFERENCES

Alonso-Blanco, C., Aarts, M.G., Bentsink, L., Keurentjes, J.J.,Reymond, M., Vreugdenhil, D., and Koornneef, M. (2009). Whathas natural variation taught us about plant development, physiol-ogy, and adaptation? Plant Cell 21: 1877–1896.

Anders, S., and Huber, W. (2010). Differential expression analysis forsequence count data. Genome Biol. 11: R106.

Barrière, A., Gordon, K.L., and Ruvinsky, I. (2012). Coevolutionwithin and between regulatory loci can preserve promoter functiondespite evolutionary rate acceleration. PLoS Genet. 8: e1002961.

Baxter, I., Muthukumar, B., Park, H.C., Buchner, P., Lahner, B.,Danku, J., Zhao, K., Lee, J., Hawkesford, M.J., Guerinot, M.L.,and Salt, D.E. (2008). Variation in molybdenum content acrossbroadly distributed populations of Arabidopsis thaliana is controlledby a mitochondrial molybdenum transporter (MOT1). PLoS Genet.4: e1000004.

Baxter, I., Brazelton, J.N., Yu, D., Huang, Y.S., Lahner, B.,Yakubova, E., Li, Y., Bergelson, J., Borevitz, J.O., Nordborg,M., Vitek, O., and Salt, D.E. (2010). A coastal cline in sodium ac-cumulation in Arabidopsis thaliana is driven by natural variation ofthe sodium transporter AtHKT1;1. PLoS Genet. 6: e1001193.

Bouchabke, O., Chang, F., Simon, M., Voisin, R., Pelletier, G., andDurand-Tardif, M. (2008). Natural variation in Arabidopsis thalianaas a tool for highlighting differential drought responses. PLoS ONE 3, e1705.

Cao, J., et al. (2011). Whole-genome sequencing of multiple Arabi-dopsis thaliana populations. Nat. Genet. 43: 956–963.

Cubillos, F.A., Coustham, V., and Loudet, O. (2012a). Lessons fromeQTL mapping studies: non-coding regions and their role behind naturalphenotypic variation in plants. Curr. Opin. Plant Biol. 15: 192–198.

Cubillos, F.A., et al. (2012b). Expression variation in connected re-combinant populations of Arabidopsis thaliana highlights distincttranscriptome architectures. BMC Genomics 13: 117.

DePristo, M.A., et al. (2011). A framework for variation discovery andgenotyping using next-generation DNA sequencing data. Nat.Genet. 43: 491–498.

Emerson, J.J., Hsieh, L.C., Sung, H.M., Wang, T.Y., Huang, C.J.,Lu, H.H., Lu, M.Y., Wu, S.H., and Li, W.H. (2010). Natural selectionon cis and trans regulation in yeasts. Genome Res. 20: 826–836.

Feil, R., and Berger, F. (2007). Convergent evolution of genomicimprinting in plants and mammals. Trends Genet. 23: 192–199.

Gan, X., et al. (2011). Multiple reference genomes and transcriptomesfor Arabidopsis thaliana. Nature 477: 419–423.

Gehring, M., Missirian, V., and Henikoff, S. (2011). Genomic analysisof parent-of-origin allelic expression in Arabidopsis thaliana seeds.PLoS ONE 6: e23687.

Goncalves, A., Leigh-Brown, S., Thybert, D., Stefflova, K., Turro,E., Flicek, P., Brazma, A., Odom, D.T., and Marioni, J.C. (2012).Extensive compensatory cis-trans regulation in the evolution ofmouse gene expression. Genome Res. 22: 2376–2384.

Grundberg, E., et al. (2011). Global analysis of the impact of envi-ronmental perturbation on cis-regulation of gene expression. PLoSGenet. 7: e1001279.

He, F., Zhang, X., Hu, J., Turck, F., Dong, X., Goebel, U., Borevitz,J., and de Meaux, J. (2012). Genome-wide analysis of cis-regulatorydivergence between species in the Arabidopsis genus. Mol. Biol. Evol.29: 3385–3395.

Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identifi-cation of functional plant modules through the integration of com-plementary data sources. Plant Physiol. 159: 884–901.

Jean, G., Kahles, A., Sreedharan, V.T., De Bona, F., and Ratsch, G.(2010). RNA-Seq read alignments with PALMapper. Curr. Protoc.Bioinformatics 11: 11.6.

Jiménez-Gómez, J.M., Wallace, A.D., and Maloof, J.N. (2010).Network analysis identifies ELF3 as a QTL for the shade avoidanceresponse in Arabidopsis. PLoS Genet. 6: e1001100.

Kerwin, R.E., Jimenez-Gomez, J.M., Fulop, D., Harmer, S.L.,Maloof, J.N., and Kliebenstein, D.J. (2011). Network quantita-tive trait loci mapping of circadian clock outputs identifies meta-bolic pathway-to-clock linkages in Arabidopsis. Plant Cell 23:471–485.

Kliebenstein, D. (2009). Quantitative genomics: analyzing in-traspecific variation using global gene expression polymorphismsor eQTLs. Annu. Rev. Plant Biol. 60: 93–114.

Lagarrigue, S., Martin, L., Hormozdiari, F., Roux, P.F., Pan, C., vanNas, A., Demeure, O., Cantor, R., Ghazalpour, A., Eskin, E., andLusis, A.J. (2013). Analysis of allele-specific expression in mouseliver by RNA-Seq: a comparison with Cis-eQTL identified usinggenetic linkage. Genetics 195: 1157–1166.

Landry, C.R., Wittkopp, P.J., Taubes, C.H., Ranz, J.M., Clark, A.G.,and Hartl, D.L. (2005). Compensatory cis-trans evolution and thedysregulation of gene expression in interspecific hybrids of Dro-sophila. Genetics 171: 1813–1822.

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read align-ment with Bowtie 2. Nat. Methods 9: 357–359.

Livak, K.J., and Schmittgen, T.D. (2001). Analysis of relative geneexpression data using real-time quantitative PCR and the 2(-DeltaDelta C(T)) method. Methods 25: 402–408.

Lowry, D.B., Logan, T.L., Santuari, L., Hardtke, C.S., Richards,J.H., DeRose-Wilson, L.J., McKay, J.K., Sen, S., and Juenger,T.E. (2013). Expression quantitative trait locus mapping acrosswater availability environments reveals contrasting associationswith genomic features in Arabidopsis. Plant Cell 25: 3266–3279.

Martin, H.C., Roop, J.I., Schraiber, J.G., Hsu, T.Y., and Brem, R.B.(2012). Evolution of a membrane protein regulon in Saccharomyces.Mol. Biol. Evol. 29: 1747–1756.

McManus, C.J., Coolon, J.D., Duff, M.O., Eipper-Mains, J.,Graveley, B.R., and Wittkopp, P.J. (2010). Regulatory divergencein Drosophila revealed by mRNA-seq. Genome Res. 20: 816–825.

Robust cis-Regulatory Variation under Stress 4309

Page 13: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

Ossowski, S., Schneeberger, K., Lucas-Lledó, J.I., Warthmann, N.,Clark, R.M., Shaw, R.G., Weigel, D., and Lynch, M. (2010). Therate and molecular spectrum of spontaneous mutations in Arabi-dopsis thaliana. Science 327: 92–94.

Ozsolak, F., and Milos, P.M. (2011). RNA sequencing: advances,challenges and opportunities. Nat. Rev. Genet. 12: 87–98.

Parts, L., Stegle, O., Winn, J., and Durbin, R. (2011). Joint geneticanalysis of gene expression data with inferred cellular phenotypes.PLoS Genet. 7: e1001276.

Paschold, A., Jia, Y., Marcon, C., Lund, S., Larson, N.B., Yeh, C.T.,Ossowski, S., Lanz, C., Nettleton, D., Schnable, P.S., andHochholdinger, F. (2012). Complementation contributes to tran-scriptome complexity in maize (Zea mays L.) hybrids relative to theirinbred parents. Genome Res. 22: 2445–2454.

Pickrell, J.K., Marioni, J.C., Pai, A.A., Degner, J.F., Engelhardt, B.E.,Nkadori, E., Veyrieras, J.B., Stephens, M., Gilad, Y., and Pritchard, J.K.(2010). Understanding mechanisms underlying human gene expressionvariation with RNA sequencing. Nature 464: 768–772.

Pineau, C., Loubet, S., Lefoulon, C., Chalies, C., Fizames, C.,Lacombe, B., Ferrand, M., Loudet, O., Berthomieu, P., andRichard, O. (2012). Natural variation at the FRD3 MATE transporter lo-cus reveals cross-talk between Fe homeostasis and Zn tolerance inArabidopsis thaliana. PLoS Genet. 8: e1003120.

Roy, S.J., Huang, W., Wang, X.J., Evrard, A., Schmöckel, S.M.,Zafar, Z.U., and Tester, M. (2013). A novel protein kinase involvedin Na(+) exclusion revealed from positional cloning. Plant Cell En-viron. 36: 553–568.

Schmitz, R.J., Schultz, M.D., Urich, M.A., Nery, J.R., Pelizzola, M.,Libiger, O., Alix, A., McCosh, R.B., Chen, H., Schork, N.J., andEcker, J.R. (2013). Patterns of population epigenomic diversity.Nature 495: 193–198.

Schneeberger, K., et al. (2011). Reference-guided assembly of fourdiverse Arabidopsis thaliana genomes. Proc. Natl. Acad. Sci. USA108: 10249–10254.

Scott, R.J., Spielman, M., Bailey, J., and Dickinson, H.G. (1998).Parent-of-origin effects on seed development in Arabidopsis thali-ana. Development 125: 3329–3341.

Simon, M., Simon, A., Martins, F., Botran, L., Tisné, S., Granier, F.,Loudet, O., and Camilleri, C. (2012). DNA fingerprinting and newtools for fine-scale discrimination of Arabidopsis thaliana ac-cessions. Plant J. 69: 1094–1101.

Skelly, D.A., Johansson, M., Madeoy, J., Wakefield, J., and Akey,J.M. (2011). A powerful and flexible statistical framework for testinghypotheses of allele-specific gene expression from RNA-seq data.Genome Res. 21: 1728–1737.

Smith, E.N., and Kruglyak, L. (2008). Gene-environment interaction inyeast gene expression. PLoS Biol. 6: e83.

Storey, J.D., and Tibshirani, R. (2003). Statistical significance forgenomewide studies. Proc. Natl. Acad. Sci. USA 100: 9440–9445.

Studer, A., Zhao, Q., Ross-Ibarra, J., and Doebley, J. (2011).Identification of a functional transposon insertion in the maize do-mestication gene tb1. Nat. Genet. 43: 1160–1163.

Tisné, S., et al. (2013). Phenoscope: an automated large-scale phe-notyping platform offering high spatial homogeneity. Plant J. 74:534–544.

Trontin, C., Tisné, S., Bach, L., and Loudet, O. (2011). What doesArabidopsis natural variation teach us (and does not teach us) aboutadaptation in plants? Curr. Opin. Plant Biol. 14: 225–231.

West, M.A., Kim, K., Kliebenstein, D.J., van Leeuwen, H.,Michelmore, R.W., Doerge, R.W., and St Clair, D.A. (2007).Global eQTL mapping reveals the complex genetic architecture oftranscript-level variation in Arabidopsis. Genetics 175: 1441–1450.

Wittkopp, P.J., Haerum, B.K., and Clark, A.G. (2004). Evolutionarychanges in cis and trans gene regulation. Nature 430: 85–88.

Zhang, X., and Borevitz, J.O. (2009). Global analysis of allele-specificexpression in Arabidopsis thaliana. Genetics 182: 943–954.

Zhu, J., Sova, P., Xu, Q., Dombek, K.M., Xu, E.Y., Vu, H., Tu, Z.,Brem, R.B., Bumgarner, R.E., and Schadt, E.E. (2012). Stitchingtogether multiple data dimensions reveals interacting metabolomicand transcriptomic networks that modulate cell regulation. PLoSBiol. 10: e1001301.

Ziebarth, J.D., Cook, M.N., Wang, X., Williams, R.W., Lu, L., andCui, Y. (2012). Treatment- and population-dependent activity pat-terns of behavioral and expression QTLs. PLoS ONE 7: e31805.

4310 The Plant Cell

Page 14: Extensive cis-Regulatory Variation Robust to Environmental ... · estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and

DOI 10.1105/tpc.114.130310; originally published online November 26, 2014; 2014;26;4298-4310Plant Cell

Olivier LoudetFrancisco A. Cubillos, Oliver Stegle, Cécile Grondin, Matthieu Canut, Sébastien Tisné, Isabelle Gy and

Arabidopsis-Regulatory Variation Robust to Environmental Perturbation in cisExtensive

 This information is current as of April 2, 2020

 

Supplemental Data /content/suppl/2014/11/24/tpc.114.130310.DC1.html

References /content/26/11/4298.full.html#ref-list-1

This article cites 50 articles, 17 of which can be accessed free at:

Permissions https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X

eTOCs http://www.plantcell.org/cgi/alerts/ctmain

Sign up for eTOCs at:

CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain

Sign up for CiteTrack Alerts at:

Subscription Information http://www.aspb.org/publications/subscriptions.cfm

is available at:Plant Physiology and The Plant CellSubscription Information for

ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists