GENE EXPRESSION PATTERNS IN BARLEY DEVELOPMENT druka... · Arnis Druka (1), Gary Muehlbauer (2),...

1
The data set will be publicly available from: BarleyBase at http://barleypop.vrac.iastate.edu/BarleyBase/ ArrayExpress at http://www.ebi.ac.uk/arrayexpress/ flag leaf ligule/collar just visible (GRO:0007089) 1-2 days before anthesis (GRO:0007101) kernel watery ripe (GRO:0007106) early milk (GRO:0007107) late milk (GRO:0007109) soft dough (GRO:0007112) immature inflorescence pistil anthers bracts 5 DAP caryopsis 10 DAP caryopsis 16 DAP caryopsis 22 DAP caryopsis (no embryo) embryo from 22 DAP caryopsis coleoptile emerged from seed (GRO:0007056) first leaf unfolded (GRO:07060) coleoptile hypocotyl radicle (seminal roots) crown 10 cm seedling root Tissue types were mapped to Cereal Plant Anatomy and Development Stage ontologies developed by Gramene 2178 1064 1668 2203 613 677 830 <NONE> (5824) INFORMATIVE (4774) HOMOLOGS (3788) 6811 ALL GENES (16044) 100 Morex Golden Promise CROWN 10 0.1 1 10 100 1 0.1 1000 1000 crown pis til 22 DAP end radicle root 5 DAP 10 DAP 22 DAP emb bracts inflorescence anthers leaf 16 DAP coleoptile hypocotyl pis til 22 DAP end 5 DAP 10 DAP 22 DAP emb 16 DAP 1 Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, SCOTLAND 2 Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA 3 Department of Plant Pathology, Iowa State University, Ames, Iowa 50110-1120 USA 4 Department of Crop and Soil Science, Oregon State University, Corvallis, OR 97331 USA 5 Departments of Crop and Soil Sciences & Genetics and Cell Biology, Washington State University, Pullman, WA 99164-6420, USA 6 Institut für Pflanzengenetik und Kulturpflanzenforschung, Correnstraße 2, D-06466 Gatersleben, GERMANY 7 MTT/BI Plant Genomics Laboratory, University of Helsinki and MTT Agrifood Research Finland, P.O. Box 56, FIN-00014 Helsinki, FINLAND 8 University of Adelaide, Plant Science, Waite Campus, PMB 1, Glen Osmond SA 5064, AUSTRALIA 9 Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521-0124, USA 10 Research Institute for Bioresources, Okayama University, Kurashiki, 710-0046, JAPAN Barley, Hordeum vulgare L. is particularly well suited for investigating patterns in monocot development. The recent release of the 22K Affymetrix Barley1 GeneChip probe array provides the opportunity to examine gene expression throughout the life cycle of a major cereal crop. In order to provide a reference data set for future investigations and hypothesis testing, transcriptional profiles of ca 22,700 barley genes were examined for 8 developmental stages and 15 tissue types in three independent replications. Here we present the results of initial data analysis from barley and outline the potential of this dataset for immediate genetic target identification and use for tissue-specific studies. GENE EXPRESSION PATTERNS IN BARLEY DEVELOPMENT Arnis Druka (1), Gary Muehlbauer (2), Ilze Druka (1), Rico Caldo (3), Ute Baumann (8), Andreas Schreiber (8), Roger Wise (3), Timothy Close (9), Andris Kleinhofs (5), Andreas Graner (6), Alan Schulman (7), Peter Langridge (8), Kazuhiro Sato (10) , Patrick Hayes (4), David Marshall (1) and Robbie Waugh (1) EXPERIMENTAL DESIGN tissue types species genotypes MOREX GOLDEN PROMISE BARLEY WHEAT 22 DAP CARYOPSIS 22 DAP EMBRYO 16 DAP CARYOPSIS 10 DAP CARYOPSIS 5 DAP CARYOPSIS ANTHERS PISTIL BRACTS INFLORESCENCE ROOT CROWN LEAF RADICLE HYPOCOTYL COLEOPTILE genes replicates 1 2 3 22 840 Barley1 Genechip Total of 90,446,400 data points will be captured SPECIFIC OBJECTIVES AND METHODS 1) Evaluate Barley1 GeneChip Affymetrix internal controls and measures from the MAS 5.0 EXP files were used to assess data quality and to perform analysis of variance (ANOVA). The total number of statistically significant positive signals and the level of variability across 21 condition was estimated. 2) What are the genes specifically expressed in each of 15 tissue types? Diagonal linear discriminant analysis (DLDA) (Dudoit, S. & Fridlyand, J. 2003) was used to determine tissue specific genes. Probe sets with permutation test p value >0.005 were selected as a tissue specific. 3) What are the genes differentially expressed between two barley cultivars? ANOVA using Welch t-test (p>0.05) and Benjamini & Hochberg False Discovery Rate detection algorithmn as a multiple testing correction was used to determine differentially expressed genes between 6 tissue types of cvs Morex and Golden Promise. 4) Exploration of different complexity reduction methods To identify naturally occurring clusters within the data set, QT clustering (Heyer, L. et al 1999) and a template matching method (Pavlidis, P. & Noble, S. 2001) was used on three different experiment interpretations; tissue types (15 conditions), genotype-tissue types (12 conditions) and a seed development time course (6 conditions). 5) Assessment of the hypothesis building potential Gene function can be inferred based on clustering of expression patterns with known function genes. Hypothetical genes have to be present in the classes containing informative known function genes. Assesment of distribution of the hypotetical genes was done by calculating proportion of them in gene classes from 2); 3) and 4) RESULTS By performing this experiment we have captured highly informative data set which can be used for expression pattern - based gene annotation, immediate genetic target identification, hypothesis building, expression data validation and the future experiment planning. Sequence - based annotation and confirmation of the selected probe sets is our current task. Analysis and integration of wheat expression data is planned for the near future. 100 300 500 pistil 5 DAP 10 DAP 16 DAP 22 DAP 22 DAP hypocotyl coleoptile radicle leaf crown root inflorescence anthers bracts Results of discriminant analysis performed to identify genes specifically expressed in the radicle. The typical output of the QT clustering, using standard correlation 0.75, and a gene count >30 per cluster. Only two clusters are shown Results of 2-way ANOVA to identify differentially expressed genes in the crown tissue between cvs Morex and Golden Promise. <NONE> best BlastX hits HOMOLOGS best BlastX hits to hypothetical rice, arabidopsis and other proteins INFORMATIVE group of hypothetical genes distributed among previously established groups. Affymetrix suggested Background (Avg) 57.3 9 20-100 Present (P) calls 63.40% 3.40% Absent (A) calls 35.00% 3.30% Marginal (M) calls 1.60% 0.20% Noise (RawQ) 2.6 0.3 replicates Scale Factor (SF) 1.1 0.3 replicates bioB (% P) A - 3%, M - 1% na P>50% Increasing bioB ? yes na increase Test F P F critical SF replicates 1 0.4 3.2 SF conditions 7.5 3.90E-08 1.8 Bio replicates 0.3 0.99 1.7 Bio conditions 9.8 1.77E-23 1.6 Scale factor (SF) and Bio control analysis of variance Measure Our experiment (mean) STDV Consolidated information from 63 GeneChips representing 21 conditions and 3 type-2 biological replicates. GP Mx Mx Mx A B C B A C A B C A B C A C B B A C A B B C C A B C A B C A A B C B C A A B C B C A A B C A C B A B C A B C A B C A B C A B C COL ANTH INFL EMB22 Mx CRO Mx GP ROOT RADICLE GP Mx GP HYP Mx GP LEAF Mx GP PST BRC 5 DAP 16 DAP 10 DAP END22 GP Two-way clustering of 63 Barley1 GeneChips and 22,840 probe sets based on average linkage and a standard correlation 10 cm seedling leaf 1) Barley GeneChip evaluation The experiment data quality measures meet or exceed manufacturers suggested values. Analysis of variance using Affymetrix controls indicated that variance of expression values is not significant (p=0.4 and 0.99) for biological replicates, while it is highly significant between conditions (p=E-8 and E-23). The 2-way hierarchical clustering based on average linkage of 63 GeneChips and 22,840 probe sets in all cases placed corresponding replicates representing a tissue type on the same secondary clade. On average 63.4% +/- 3.4 % probe sets per chip were determined as statistically positive (p>=0.05 per probe set). The total number of statistically significant probe sets was 20,645 or 90%, meaning that on average 300 probesets per condition are specific to it. 2) Finding tissue type classes The 15,936 statistically significant differentially expressed genes between at least two conditions were found. Using slightly modified linear diagonal discriminant analysis (d>0.5) on average 85 probe sets per tissue type were identified . 3) Genotype-dependent gene expression A total of 1,236 probe sets were identified as differentially expressed (>2.5 fold) between two barley cultivars, Golden Promise and Morex. 446 (about 30%) probe sets were common for at least two different tissue types. Crown and leaf were the tissue types where most of the differentially expressed genes were found, while in the root none was detected. 4) Complexity reduction Using QT clustering we found 4,774 probe sets in 84 clusters from tissue type interpretation (r>0.75, min 25), 5,824 probe sets in 40 clusters from genotype-tissue interpretation (r>0.75, min 50) and 3788 probe sets grouped in 69 clusters (r>0.9, min 50) from seed development time course. The 55% of total probe sets or 78% of informative probe sets can be assigned to the clusters. The overlap between three groups was only 17%. 5) Partitioning of the hypothetical genes There are 9,529 hypothetical gene probe sets on the on the Barley1 GeneChip of which 5,096 of them belong to different expression classes. There are total of 13,397 explained informative probe sets, 62% of which have some functional assignment. The individual classes contain on average about 30% of the hypothetical genes, indicating unbiased expression profile-based partitioning of the hypothetical genes. CONCLUSIONS Venn diagram representing different groups of hypothetical genes. Seed development time course analysis using template matching method with smooth correlation >0.9 Expression profiles during seed development Besides QT clustering we used template matching method for finding genes differentially expressed during seed development as shown in the graphs on the left. The overlap between seed development QT clusters and a supervised partitioning was 34% suggesting that by using several alternative partitioning methods it is possible to achieve significant increase in efficiency of complexity reduction. COL - coleoptile INFL - immature inflorescence EMB22 - embryo from 22DAP caryopsis CRO - crown HYP - hypocotyl BRC - bracts ANTH - anthers PST - pistil 5, 10, 16 DAP - caryopsis, days after pollination END22 - 22 DAP caryopsis without embryo GP - Golden Promise Mx - Morex

Transcript of GENE EXPRESSION PATTERNS IN BARLEY DEVELOPMENT druka... · Arnis Druka (1), Gary Muehlbauer (2),...

Page 1: GENE EXPRESSION PATTERNS IN BARLEY DEVELOPMENT druka... · Arnis Druka (1), Gary Muehlbauer (2), Ilze Druka (1), Rico Caldo (3), Ute Baumann (8), Andreas Schreiber (8), Roger Wise

The data set will be publicly available from:BarleyBase at http://barleypop.vrac.iastate.edu/BarleyBase/ArrayExpress at http://www.ebi.ac.uk/arrayexpress/

flag leaf ligule/collar just visible (GRO:0007089)

1-2 days before anthesis (GRO:0007101)

kernel watery ripe (GRO:0007106)

early milk (GRO:0007107)

late milk (GRO:0007109)

soft dough (GRO:0007112)

immature inflorescence

pistil

anthers

bracts

5 DAP caryopsis

10 DAP caryopsis

16 DAP caryopsis

22 DAP caryopsis (no embryo)

embryo from 22 DAP caryopsis

coleoptile emerged from seed (GRO:0007056)

first leaf unfolded (GRO:07060)

coleoptile

hypocotyl

radicle(seminal roots)

crown

10 cm seedling root

Tissue types were mapped to Cereal Plant Anatomy and Development Stage ontologies developed by Gramene

2178 1064

1668

2203

613 677

830

<NONE > (5824)

INF OR MAT IV E (4774)

HOMOLOG S (3788)

6811 ALL G E NE S (16044)

100

Morex

Gol

den

Prom

ise

CROWN

10

0.1

1

10 10010.1

1000

1000

crown

pistil

22 DA

P end

radicle

root5 D

AP

10 DA

P

22 DA

P em

b

bracts

inflorescence

anthers

leaf16 D

AP

coleoptile

hypocotyl

pis til 22 DAP end

5 DAP 10 DAP 22 DAP emb

16 DAP

1 Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, SCOTLAND2 Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA 3 Department of Plant Pathology, Iowa State University, Ames, Iowa 50110-1120 USA 4 Department of Crop and Soil Science, Oregon State University, Corvallis, OR 97331 USA 5 Departments of Crop and Soil Sciences & Genetics and Cell Biology, Washington State University, Pullman, WA 99164-6420, USA 6 Institut für Pflanzengenetik und Kulturpflanzenforschung, Correnstraße 2, D-06466 Gatersleben, GERMANY 7 MTT/BI Plant Genomics Laboratory, University of Helsinki and MTT Agrifood Research Finland, P.O. Box 56, FIN-00014 Helsinki, FINLAND 8 University of Adelaide, Plant Science, Waite Campus, PMB 1, Glen Osmond SA 5064, AUSTRALIA9 Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521-0124, USA 10 Research Institute for Bioresources, Okayama University, Kurashiki, 710-0046, JAPAN

Barley, Hordeum vulgare L. is particularly well suited for investigating patterns in monocot development. The recent release of the 22K Affymetrix Barley1 GeneChip probe array provides the opportunity to examine gene expression throughout the life cycle of a major cereal crop. In order to provide a reference data set for future investigations and hypothesis testing, transcriptional profiles of ca 22,700 barley genes were examined for 8 developmental stages and 15 tissue types in three independent replications. Here we present the results of initial data analysis from barley and outline the potential of this dataset for immediate genetic target identification and use for tissue-specific studies.

GENE EXPRESSION PATTERNS IN BARLEY DEVELOPMENTArnis Druka (1), Gary Muehlbauer (2), Ilze Druka (1), Rico Caldo (3), Ute Baumann (8), Andreas Schreiber (8), Roger Wise (3), Timothy Close (9), Andris Kleinhofs (5), Andreas Graner (6), Alan Schulman (7), Peter Langridge (8), Kazuhiro Sato (10) , Patrick Hayes (4), David Marshall (1) and Robbie Waugh (1)

EXPERIMENTAL DESIGN

tissue types

species

genotyp

es

MOREX

GOLDEN PROMISE

BARLEY

WHEAT

22 D

AP

CA

RY

OP

SIS

22 D

AP

EM

BR

YO

16 D

AP

CA

RY

OP

SIS

10 D

AP

CA

RY

OP

SIS

5 D

AP

CA

RY

OP

SIS

AN

THE

RS

PIS

TIL

BR

AC

TS

INFL

OR

ES

CE

NC

E

RO

OT

CR

OW

N

LEA

F

RA

DIC

LE

HY

PO

CO

TYL

CO

LEO

PTI

LE

genes

replicates

1

2

3

22 840 Barley1 G

enechip

Total of 90,446,400 data points will be captured

SPECIFIC OBJECTIVES AND METHODS 1) Evaluate Barley1 GeneChip

Affymetrix internal controls and measures from the MAS 5.0 EXP files were used to assess data quality and to perform analysis of variance (ANOVA). The total number of statistically significant positive signals and the level of variability across 21 condition was estimated.

2) What are the genes specifically expressed in each of 15 tissue types?

Diagonal linear discriminant analysis (DLDA) (Dudoit, S. & Fridlyand, J. 2003) was used to determine tissue specific genes. Probe sets with permutation test p value >0.005 were selected as a tissue specific.

3) What are the genes differentially expressed between two barley cultivars?

ANOVA using Welch t-test (p>0.05) and Benjamini & Hochberg False Discovery Rate detection algorithmn as a multiple testing correction was used to determine differentially expressed genes between 6 tissue types of cvs Morex and Golden Promise.

4) Exploration of different complexity reduction methods To identify naturally occurring clusters within the data set, QT clustering (Heyer, L. et al 1999) and a template matching method (Pavlidis, P. & Noble, S. 2001) was used on three different experiment interpretations; tissue types (15 conditions), genotype-tissue types (12 conditions) and a seed development time course (6 conditions).

5) Assessment of the hypothesis building potential Gene function can be inferred based on clustering of expression patterns with known function genes. Hypothetical genes have to be present in the classes containing informative known function genes. Assesment of distribution of the hypotetical genes was done by calculating proportion of them in gene classes from 2); 3) and 4)

RESULTS

By performing this experiment we have captured highly informative data set which can be used for expression pattern - based gene annotation, immediate genetic target identification, hypothesis building, expression data validation and the future experiment planning. Sequence - based annotation and confirmation of the selected probe sets is our current task. Analysis and integration of wheat expression data is planned for the near future.

100

300

500

pist

il5

DA

P10

DA

P16

DA

P22

DA

P22

DA

P

hyp

ocot

yl c

oleo

ptile

radi

cle

leaf

crow

nro

otin

flore

scen

cean

ther

sbr

acts

Results of discriminant analysis performed to identify genes specifically expressed in the radicle.

The typical output of the QT clustering, using standard correlation 0.75, and a gene count >30 per cluster. Onlytwo clusters are shown

Results of 2-way ANOVA to identify differentially expressed genes in the crown tissue between cvs Morex and Golden Promise.

<NONE> best BlastX hits

HOMOLOGSbest BlastX hits to hypothetical rice, arabidopsis and other proteins

INFORMATIVEgroup of hypothetical genes distributed amongpreviously established groups.

Affymetrix suggested

Background (Avg) 57.3 9 20-100Present (P) calls 63.40% 3.40%  Absent (A) calls 35.00% 3.30%  Marginal (M) calls 1.60% 0.20%  Noise (RawQ) 2.6 0.3 replicatesScale Factor (SF) 1.1 0.3 replicatesbioB (% P) A - 3%, M - 1% na P>50%Increasing bioB ? yes na increase

Test F P F criticalSF replicates 1 0.4 3.2SF conditions 7.5 3.90E-08 1.8Bio replicates 0.3 0.99 1.7Bio conditions 9.8 1.77E-23 1.6

Scale factor (SF) and Bio control analysis of variance

Measure Our experiment (mean) STDV

Consolidated information from 63 GeneChips representing21 conditions and 3 type-2 biological replicates.

GPMx

Mx

Mx

A B C B A C A B C A B C A C B B A C A B B C C A B C A B C A A B C B C A A B C B C A A B C A C B A B C A B C A B C A B C A B C

COL ANTHINFL

EMB22

Mx

CRO

Mx GP

ROOT RADICLE

GP

MxGP

HYP

MxGP

LEAF

Mx GP

PSTBRC

5 DAP

16 DAP

10 DAP

END22GP

Two-way clustering of 63 Barley1 GeneChips and 22,840 probe sets based on average linkage and a standard correlation

10 cm seedlingleaf

1) Barley GeneChip evaluation

The experiment data quality measures meet or exceed manufacturers suggested values. Analysis of variance using Affymetrix controls indicated that variance of expression values is not significant (p=0.4 and 0.99) for biological replicates, while it is highly significant between conditions (p=E-8 and E-23). The 2-way hierarchical clustering based on average linkage of 63 GeneChips and 22,840 probe sets in all cases placed corresponding replicates representing a tissue type on the same secondary clade.

On average 63.4% +/- 3.4 % probe sets per chip were determined as statistically positive (p>=0.05 per probe set). The total number of statistically significant probe sets was 20,645 or 90%, meaning that on average 300 probesets per condition are specific to it.

2) Finding tissue type classes

The 15,936 statistically significant differentially expressed genes between at least two conditions were found. Using slightly modified linear diagonal discriminant analysis (d>0.5) on average 85 probe sets per tissue type were identified .

3) Genotype-dependent gene expression

A total of 1,236 probe sets were identified as differentially expressed (>2.5 fold) between two barley cultivars, Golden Promise and Morex. 446 (about 30%) probe sets were common for at least two different tissue types. Crown and leaf were the tissue types where most of the differentially expressed genes were found, while in the root none was detected.

4) Complexity reduction

Using QT clustering we found 4,774 probe sets in 84 clusters from tissue type interpretation (r>0.75, min 25), 5,824 probe sets in 40 clusters from genotype-tissue interpretation (r>0.75, min 50) and 3788 probe sets grouped in 69 clusters (r>0.9, min 50) from seed development time course. The 55% of total probe sets or 78% of informative probe sets can be assigned to the clusters. The overlap between three groups was only 17%.

5) Partitioning of the hypothetical genes

There are 9,529 hypothetical gene probe sets on the on the Barley1 GeneChip of which 5,096 of them belong to different expression classes. There are total of 13,397 explained informative probe sets, 62% of which have some functional assignment. The individual classes contain on average about 30% of the hypothetical genes, indicating unbiased expression profile-based partitioning of the hypothetical genes.

CONCLUSIONS

Venn diagram representing different groups of hypothetical genes.

Seed development time course analysis using template matching method with smooth correlation >0.9

Expression profiles during seed development

Besides QT clustering we used template matching method for finding genes differentially expressed during seed development as shown in the graphs on the left. The overlapbetween seed development QT clusters and a supervisedpartitioning was 34% suggesting that by using several alternative partitioning methods it is possible to achieve significant increase in efficiency of complexity reduction.

COL - coleoptile

INFL - immature inflorescence

EMB22 - embryo from 22DAP caryopsis

CRO - crown

HYP - hypocotyl

BRC - bracts

ANTH - anthers

PST - pistil

5, 10, 16 DAP - caryopsis, days after pollination

END22 - 22 DAP caryopsis without embryo

GP - Golden Promise

Mx - Morex