Transcript Profiling in Maize Inflorescences using...

39
Andrea L. Eveland Transcript Profiling in Maize Inflorescences using High-throughput RNA-Sequencing Cold Spring Harbor Laboratory D. Jackson lab, Plant Genetics D. Ware lab, Bioinformatics & Genomics Corn Breeders’ School March 7-8, 2011 Champaign, IL

Transcript of Transcript Profiling in Maize Inflorescences using...

Andrea L. Eveland

Transcript Profiling in Maize Inflorescences using

High-throughput RNA-Sequencing

Cold Spring Harbor Laboratory

D. Jackson lab, Plant Genetics

D. Ware lab, Bioinformatics & Genomics

Corn Breeders’ School

March 7-8, 2011

Champaign, IL

Maize as a model for grain yield and biomass

Maize inflorescence development:

meristem determinacy and branch patterning

* * * *

*

Tassel Ear

Inflorescence meristem (IM)

2 Spikelet meristems (SM)

4 Floral meristems (FM)

Spikelet pair meristem (SPM)

Branch meristem (BM)

(tassel only) *

*

ramosa (ra) mutants of maize:

• display highly branched phenotypes

• indeterminacy in SPM

• influenced by genetic factors

• RA1 implicated in evolution of inflorescence

architecture in grasses Axillary Meristem

Determinacy

RA1

trehalose-P

phosphatase LOB TF

RA2 RA3

Zn finger

TF

ramosa1 ramosa2 ramosa3

• RA1 and RA3 co-localize to arc of cells just below SPM

• Suggested to act non-cell autonomously,

possibly by a mobile signal

• RA1 is epistatic to RA2 and RA3

RA1 RA3

RA3 = black

KN1 = red

Satoh-Nagasawa et al.

Axillary Meristem

Determinacy

RA1

trehalose-P

phosphatase LOB TF

RA2 RA3

Zn finger

TF

Overview

• Genome-wide transcript profiling using RNA-seq

• Systems level analyses

• Comparative genomics

• Moving forward: Integrating heterogeneous data types

• Co-expression

• Developmental series

• Genetic perturbation

• Prioritizing candidate gene selection: branch architecture

• Integration of RNA-seq data sets and genetic information

Overview

• Genome-wide transcript profiling using RNA-seq

• Systems level analyses

• Comparative genomics

• Moving forward: Integrating heterogeneous data types

• Co-expression

• Developmental series

• Genetic perturbation

• Prioritizing candidate gene selection: branch architecture

• Integration of RNA-seq data sets and genetic information

Developmental progression along maize inflorescence: capture

expression profiles in time and space

Experimental System: Sampling for RNA-seq

1mm

2mm

3mm

discard

discard

discard

IM/SPM (Tip)

SM (Mid)

FM (Base)

*R. Schmidt, S. Stanfield

10mm ear:

dissected specific meristem types Developing tassel libraries:

≤2mm, 3-4mm, 5-7mm

• whole tassel samples

*E. Vollbrecht, E. Unger-Wallace

• pools of ~10 tassels

wild-type ra1 ra2 ra3

1 m

m

2 m

m

Mutant ears collected at 2 developmental stages

*with Sasha Goldshmidt

Staging before and after visible phenotype

• Genotyped in field

• 5-10 ears per sample

• 3 biological replicates

• Validation by Q-PCR

RNA-seq library construction

AAAA

200nt

50nt

50nt

DS cDNA fragments

Random priming

PE sequencing

Shear RNAs

Isolate poly-A RNA

RNA-seq PE library construction:

Paired-end sequencing RNA-Sequencing Illumina GA2

*50nt PE sequenced,

200bp inserts

Tophat

Cufflinks

Cuffcompare

Cuffdiff

Bowtie

Known maize gene models (build 5a.59)

Reference genome: AGP_v2

*Map paired reads to genome

*Assemble consensus of coverage

*Generate splice site index based on gene models

*Map to splice junctions

*Stitch consensus regions into transcripts

*Build structural equivalence classes of transcripts

*Determine differential expression

RNA-seq pipeline using the open-source “Tuxedo suite”:

**Trapnell et al., 2010

*Fragments Per Kilobase exon per Million reads

Category Sample reps

Ear (10mm)

developmental

series

Ear tip IM/SM 2

Ear mid SPM 2

Ear base FM 2

Tassel

developmental

series

Tassel < 2mm 1

Tassel 3-4mm 1

Tassel 5-7mm 1

Mutant

(1mm ear) series

wt 1mm ear 2

ra1 1mm ear 2

ra2 1mm ear 2

ra3 1mm ear 3

Mutant

(2mm ear) series

wt 2mm ear 3

ra1 2mm ear 3

ra2 2mm ear 3

ra3 2mm ear 3

Inflorescence 2mm 1

Vegetative SAM 1

Current RNA-seq

datasets:

Maize

Sorghum

Mapping Summary Statistics

Sample reps mapped

reads

#

genes

# transcripts > 5 FPKM > 2

FPKM

Ear tip (IM/SPM) 2 18.8M 20,040 37,679 14,988 22,396

Ear mid (SM) 2 21.9M 22,260 42,051 16,994 24,927

Ear base (FM) 2 21.3M 22,318 42,020 17,136 25,046

Tassel ≥ 2mm 1 14.5M 22,725 42,216 15,153 22,423

Tassel 3-4mm 1 14.7M 23,605 43,846 15,971 23,520

Tassel 5-7mm 1 14.5M 24,631 45,662 18,170 26,181

r = 0.89 r = 0.75

ear tip_1 (FPKM)

ea

r tip

_2

(F

PK

M)

ea

r b

ase

_1

(F

PK

M)

ear tip_1 (FPKM)

Working Gene Set: 10,411 additional

genes expressed:

Maize Filtered Gene Set (FGS; gene build 5a):

Transcripts per gene: 1.60

Total high-confidence genes: 39,656

Exons per transcript: 5.21

Genes identified as expressed in ear and tassel developmental series:

0

3000

6000

9000

12000

15000

>1000 100-1000 30-100 2to30 <2

FGS: 26,319 expressed

(99% protein coding):

0

2000

4000

6000

>1000 100-1000 30-100 2to30 <2

protein coding (66%)

pseudogene (15%)

transposable element (19%)

To

tal #

ge

ne

s e

xp

resse

d

FPKM FPKM

Overview

• Genome-wide transcript profiling using RNA-seq

• Systems level analyses

• Comparative genomics

• Moving forward: Integrating heterogeneous data types

• Co-expression

• Developmental series

• Genetic perturbation

• Prioritizing candidate gene selection: branch architecture

• Integration of RNA-seq data sets and genetic information

Organ-specificity

27,468 405 669

ear tassel

94 16,137 28,401

inflorescence vegetative

(shoot apical meristem+leaf)

• tassel-specific genes may help identify

key regulators in branching

• inflorescence libraries compared with

publically available RNA-seq data sets

Maize-specific (3%)

*EnsemblCompara gene trees: gramene.org; maizesequence.org

Using comparative genomics to classify expressed genes

and prioritize candidates:

Embryophyta (47%)

Poaceae (6.5%)

Eukaryota (32%)

Magnoliophyta (11%)

Andropogoneae (0.5%)

Genes expressed in inflorescences:

17,513 genes expressed in sorghum libraries > 0.5 FPKM:

Comparative profiling in Sorghum

339 735 16,439

shoot apical meristem inflorescence

• Highly branched inflorescence

• Model for drought tolerance

• Comparative genomics

tissue PE reads % mapped >2FPKM >5FPKM

inflorescence 32M 86% 80% 62%

SAM 22.7M 89% 78% 60%

*22 sorghum orthologs of tassel-

specific genes are expressed

exclusively in the inflorescence

compared to SAM

Transcript Isoform specificity

ear tassel

*~450 loci with tassel-specific transcripts > 10 FPKM

45,671 6,445 1,405

ARF8

*Differentially expressed during development

TF expression signatures in ear meristem types:

IM/SPM SM FM # TFs

13

0

1005

53

2

15

2

*

IM/SPM SM FM # TFs

21

134

11

Differentially expressed TFs:

**Enrichment: 0.5 FPKM > < 0.05 FPKM

TF Gene id Functional description IM/SPM SM FM

AUX-IAA GRMZM2G159285 IAA13

AUX-IAA GRMZM2G163848 IAA5

ARF GRMZM2G078274 uncharacterized

C2H2 GRMZM2G134759 uncharacterized

C2H2 GRMZM2G100146 uncharacterized

C2C2-GATA GRMZM2G397616 TSH1-duplicate

C3H GRMZM2G025014 Splicing factor U2af 38kDa

C3H GRMZM2G086614 RNA binding protein

HB GRMZM2G001289 uncharacterized

HB GRMZM2G154641 uncharacterized

HB GRMZM2G056600 uncharacterized

LFY GRMZM2G180190 Floricaula/leafy-like 2

SBP GRMZM2G113779 uncharacterized

SBP GRMZM2G101511 Teosinte glume architecture1

Differentially expressed TFs during ear development

high low expression:

**validation based on marker gene expression

Overview

• Genome-wide transcript profiling using RNA-seq

• Systems level analyses

• Comparative genomics

• Moving forward: Integrating heterogeneous data types

• Co-expression

• Developmental series

• Genetic perturbation

• Prioritizing candidate gene selection: branch architecture

• Integration of RNA-seq data sets and genetic information

Axillary Meristem

Determinacy

RA1

RA3 RA2

trehalose-P

phosphatase LOB TF

Zn finger

TF

Cluster genes by expression profiles in the different genotypes

wt ra1 ra2 ra3

*identify putative genes related to a RA1-RA3-dependent pathway

773 ra3

1,642

ra1

1,641

ra2

1,619

ra3

1,033 663 170

ra1

3,301

ra2

2,155

Differentially expressed genes in ramosa mutants

377

209

158

333

260 502

1,156

64

1,312

272 136

1mm: 2mm:

*322 genes differentially expressed in all mutants at 1mm + 2mm

1mm up (%) down (%)

ra1 53 47

ra2 56 44

ra3 65 35

2mm up (%) down (%)

ra1 59 41

ra2 51 49

ra3 46 54

% up- and down-regulated in ramosa mutants:

K-means co-expression clusters with significant enrichment

for genes differentially in ra1 and ra3 mutants

2

1

0

-1

-2

Re

lative

exp

ressio

n

38%

tip mid base stg1 stg2 stg3

ear development tassel development

2

1

0

-1

-2

Rela

tive e

xpre

ssio

n

tip mid base stg1 stg2 stg3

ear development tassel development

2

1

0

-1

-2

Re

lative

exp

ressio

n

tip mid base stg1 stg2 stg3

55%

47%

Gene Ontology Class P-value

Protein complex assembly 0.0001

Response to freezing 0.0017

Multicellular homeostasis

Signaling pathway 0.002

G-protein receptor signaling 0.008

RA1

RA3

TCP

C3H

ZAP1

HB-trihelix 0

100

200

300

wt ra1 ra3

0

100

200

300

wt ra1 ra3

bzip

TFs that are co-expressed with RA1 and RA3 and DE in mutants

* (up-regulated); * (down-regulated) = p < 0.001

0

1.5

3

4.5

wt ra1 ra3ra1 ra3 wt

* * *

*

0

20

40

60

80

wt ra1 ra3ra1 ra3 wt

* * * *

ra1 ra3 wt

* * *

ra1 ra3 wt

* * *

0

40

80

120

160

200

wt ra1 ra3ra1 ra3 wt

* * *

FP

KM

F

PK

M

1mm 2mm

FP

KM

F

PK

M

Differentially expressed GRAS family TFs

GRMZM2G079470

GRMZM2G024973:

Dwarf9

GRMZM2G144744:

Dwarf8

0

40

80

120

wt ra1 ra2 ra3ra3 wt ra2 ra1

* *

*

*

0

50

100

150

200

wt ra1 ra2 ra3ra3 wt ra2 ra1

* * *

0

100

200

300

400

500

wt ra1 ra2 ra3ra3 wt ra2 ra1

* * *

* *

* (up-regulated); * (down-regulated)

= p < 0.001

1mm 2mm

FP

KM

FP

KM

• Genes involved in GA signaling and

at intersect of multiple hormone pathways

TF-family up down alt

HB 5 1 2

AP2-EREBP 4 2

C2H2 1 2 1

C3H 1 2 1

TCP 3

MYB 1 2

HMG 1 2

AUX-IAA 1 1

MADS 2 0

NAC 1 1

ABI3-VP1 1 1

bZIP 1 1

bHLH 2

WRKY 1 1

SBP 1

BES 1

LFY 1

GRAS 1

TFs differentially expressed in both ra1 and ra3 mutants

TF-family up down alt

bZIP 2 3 1

HB 4 1

C3H 2 2

LUG 2 1

AUX-IAA 2

MADS 2

BES1 1 1

C2H2-Dof 1 1

AP2-EREBP 1 1

MYB 2

LFY 2

HMG 1 1

SBP 1

GRAS 1

WRKY 1

LOB 1

1mm: 64 TFs 2mm: 49 TFs

GRMZM2G069082

Clustering of differentially expressed AP2-EREBP genes

*capture quantitative differences in TF gene family expression

1mm 2mm 1mm 2mm 1mm 2mm 1mm 2mm wild-type ramosa1 ramosa2 ramosa3

wt ra1 ra3 TFs

6

1

918

3

0

5

10

GRMZM2G124011 DDF;CBF; AP2-EREBP CBF6

GRMZM2G002894 EMB1967; FHA uncharacterized

GRMZM2G466044 AP2-EREBP uncharacterized

GRMZM2G404375 Trihelix uncharacterized

GRMZM2G059102 MADS MADS Box TF 47

ra3 specific TFs:

TF expression in ra mutants (1mm):

GRMZM2G124011 DDF;CBF;ERF/AP2 CBF6

*Dehydration responsive element

*Expression of these TFs enriched in tassel

GRMZM2G466044

GRMZM2G466044 AP2-EREBP uncharacterized

GRMZM2G069082

GRMZM2G124011: CBF6

GRMZM2G069082 GRMZM2G069146

GRMZM2G124037: CBF3

Chr7:

Chr2:

Expression of tandem AP-EREBP genes enriched in tassel

GRMZM2G069082

*

*

GRMZM2G124011: CBF6

GRMZM2G069082 GRMZM2G069146

GRMZM2G124037: CBF3

Chr7:

Chr2:

0

20

40

60

80

tip mid base stg1 stg2 stg3

GRMZM2G069146

GRMZM2G124011

GRMZM2G069082

GRMZM2G124037 *

* F

PK

M

0

20

40

60

80

wt ra1 ra2 ra3

1mm

2mm

FP

KM

0

5

10

15

20

wt ra1 ra2 ra3

1mm

2mm

FP

KM

ear tassel

GRMZM2G069082

Maize chr 2

Maize chr 7

Rice chr 9

Sorghum chr 2

*Ortholog of differentially expressed CBF genes in sorghum is inflorescence-specific

* EnsemblCompara gene trees **Josh Stein

Summary

• AP2-EREPB TFs: possible intersects with development and stress

• Integrate large data sets to extract biological information

• Genome-wide expression profiles: build framework for exploratory

research and continuous refinement

• Further refine the system through integration of heterogeneous data

• Prioritize candidates for downstream analyses

• Model system: developing maize inflorescence primordia

- RA1 transgene (both GFP and HA tagged) introduced into ra1

- Combinatorial binding of TFs

- Gene expression influenced by both cis- and trans-acting factors

- Determine targets for key transcriptional regulators

Moving Forward: Integrating heterogeneous data types

• Genome-wide TF occupancy information: ChIP-seq

• Enrichment of cis-regulatory elements

• Small RNA profiles

• Variation data: associate genotype-to-phenotype

**example KN1 interacts with RA1 in Y2H

- Overlay RNA-seq data on variation maps

Acknowledgements

David Jackson Doreen Ware

• Sasha Goldshmidt • Jer-ming Chia

• Andrew Olson

• Sunita Kumari

• Molly Hammell

• Elena Ghiban

• Richard McCombie

• Tom Brutnell, BTI

• Lin Wang

• Robert Schmidt, UCSD

• Jerry Lu

• Shiran Pasternak

• Michael Regulski

• Michael Pautler

• Josh Stein

• Erik Vollbrecht, Iowa State

CSHL

Maize inflorescence project

NSF Postdoctoral Fellowship

in Biological Informatics

• Aaron Chuah

• Laura Gelley NSF PGRP