Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord...

17
Supplementary Materials A Systematic Heritability Analysis of the Human Whole Blood Transcriptome Tianxiao Huan 1,2 , Chunyu Liu 1.2 , Roby Joehanes 1,2,3 , Xiaoling Zhang 1 , Brian H. Chen 1.2 , Andrew D. Johnson 1 , Chen Yao 1,2 , Paul Courchesne 1,2 , Christopher J. O'Donnell 1 , Peter J. Munson 3 , and Daniel Levy 1,2 1 The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA 2 Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Bethesda, MD 3 Mathematical and Statistical Computing Laboratory, Division of Computational Bioscience, Center for Information Technology, National Institutes of Health, Bethesda, MD Correspondence should be addressed to: Daniel Levy, MD Framingham Heart Study Population Sciences Branch National Heart, Lung, and Blood Institute 73 Mt. Wayte Avenue, Suite 2 Framingham, MA 01702 Email: [email protected] Phone: 508-935-3458 Fax: 508-872-2678 Running title: Heritability of the Blood Transcriptome Key words: heritability, eQTL, transcriptome, gene expression Word count of the abstract: 245 Word count of the body without references or tables: 5637 Word count of the body including references and tables: 9814 Numbers of figures: 5 1

Transcript of Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord...

Page 1: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Materials

A Systematic Heritability Analysis of the Human Whole Blood Transcriptome

Tianxiao Huan1,2, Chunyu Liu1.2, Roby Joehanes1,2,3, Xiaoling Zhang1, Brian H. Chen1.2, Andrew D. Johnson1, Chen Yao1,2, Paul Courchesne1,2, Christopher J. O'Donnell1, Peter J. Munson3, and Daniel Levy 1,2

1The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA2Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Bethesda, MD3Mathematical and Statistical Computing Laboratory, Division of Computational Bioscience, Center for Information Technology, National Institutes of Health, Bethesda, MD

Correspondence should be addressed to:

Daniel Levy, MDFramingham Heart StudyPopulation Sciences Branch National Heart, Lung, and Blood Institute73 Mt. Wayte Avenue, Suite 2Framingham, MA 01702Email: [email protected]: 508-935-3458Fax: 508-872-2678

Running title: Heritability of the Blood Transcriptome

Key words: heritability, eQTL, transcriptome, gene expression

Word count of the abstract: 245Word count of the body without references or tables: 5637Word count of the body including references and tables: 9814Numbers of figures: 5Numbers of tables: 4

1

Page 2: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Notes

Heritability estimation in relation to cohorts and sample size

It is well known that heritability estimates are affected by sample size and family structure (Tenesa and

Haley 2013). Previous studies reported heritability for gene expression in smaller sample groups

(n=~1000) and with relatively simple family structures (Emilsson et al. 2008; Goring et al. 2007;

Grundberg et al. 2012; Price et al. 2011; Stranger et al. 2007). For example, Grundberg et al. reported

heritability for gene expression in twins (Grundberg et al. 2012); Price et al reported heritability for gene

expression in a population sample via identity-by-descent (IBD) in related or unrelated individuals (Price

et al. 2011).

We found that 40% of gene transcript levels are heritable and the average heritability of global

gene expression genome-wide is estimated to be 0.07 (for all 18,000 genes) and 0.13 (for 7,161 genes

with hgeneExp2 >0). Several studies reported that 40-70% of gene transcripts are heritable with hgeneExp

2 >0

(Emilsson et al. 2008; Goring et al. 2007; Price et al. 2011; Stranger et al. 2007). Dixon et al reported that

the average heritability for global gene expression is 0.2 (Dixon et al. 2007), and the average heritability

for transcripts with hgeneExp2 >0 range between 0.15 and 0.30 (Emilsson et al. 2008; Price et al. 2011;

Stranger et al. 2007). These heritability estimates are considerable larger than our findings.

We further compared the heritability estimates at the individual transcript level between our study

and the study by Goring et al. (Göring et al. 2007). We used the following formula to compare if

heritability estimates were smaller in FHS than in the study by Goring et al. (Göring et al. 2007).

Δ=hFHS

2 −hGoring2

hGoring2 %

,

here hFHS

2

and hGoring

2

represent the heritability estimates in FHS and in the study by Goring et al.

(Göring et al. 2007), respectively, for the same gene transcripts. Supplementary Table S3 showed that

62% of 838 genes with hgeneExp2 >0.2 display lower heritability estimates in FHS than in the study by

Goring at al. (Göring et al. 2007) using Δ < -10% rule. The Pearson correlation is only 0.3 between the

heritability estimates in two studies. However, when the heritability estimates being higher, a smaller

proportion of genes display lower heritability estimates in FHS than in the study by Goring et al. (Göring

et al. 2007). In addition, the heritability estimates become more concordant in both studies for genes with

higher heritability estimates. The Pearson correlation between the heritability estimates of genes in both

studies increased from 0.3 to 0.85 when the heritability estimates increased from 0.2 to 0.8.

2

Page 3: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

The overall distribution of hgeneExp2 for genes in the FHS offspring cohort and the third generation

cohort is similar (Supplementary Figure S2). The average heritability is slightly higher in the offspring

cohort (average hgeneExp2 =0.087 and median hgeneExp

2 =0.052) than in the third generation cohort (average

hgeneExp2 =0.078 and median hgeneExp

2 =0.044). To investigate if sample size and family structure affects

heritability estimates, we randomly selected n=50, 100 and 200 nuclear families from the full sample. We

estimated heritability using two family structures: 1) offspring only (that is, only offspring individuals in

these n=50, 100, and 200 nuclear families) and 2) both offspring and parents (that is, the entire nuclear

families). To minimize fluctuations, we performed 500 random samplings in selected nuclear families.

We randomly selected 10 genes in each of seven hgeneExp2 levels (70 genes in total), hgeneExp

2 = (0, 0.1), (0.1,

0.2), (0.2, 0.3), (0.3, 0.4), (0.4, 0.5), (0.5, 0.6), and (0.6, 1) to represent the whole gene expressions.

Supplementary Figures S3 displays hgeneExp2 estimates from different numbers of nuclear families for 70

genes. The standard errors of hgeneExp2 are 0.16, 0.11 and 0.07 on average when sampling 50, 100, and 200

full families, and are 0.18, 0.12, and 0.08 when sampling 50, 100, and 200 siblings from unclear families.

This result indicates that larger sample size and more comprehensive family structure provide more

accurate heritability estimates.

Heritability estimation in relation to cell types

Previous heritability estimates were reported for gene expression measured in different tissues [19, 20]. In

the FHS, gene expression levels were measured using whole blood derived RNA, which contains multiple

cell types. To gain insight into how different cell types in whole blood affect heritability estimates, we

compared the heritability estimates for transcripts (n=18,000) with and without adjusting for cell types

(i.e. white blood cells [WBCs], neutrophils, lymphocytes, monocytes, platelets, eosinophils and

basophils) as additional covariates in the LME models. The cell count proportions in whole blood were

measured in 2138 individuals. We imputed cell count proportions of all FHS individuals by Partial Least

Squares (PLS) regression (Boardman et al. 1981). The imputed cell count proportion values were highly

consistent with the measured cell counts proportion (Joehanes R, PhD, in preparation). The average

hgeneExp2 remained unchanged, that is the average hgeneExp

2 =0.072 before and after accounting for cell

counts.

Furthermore, we selected individuals falling within the top 25% and bottom 25% for three cell

types [ WBCs, neutrophils, and lymphocytes, respectively], and compared the heritability estimates in the

top and bottom levels for these three cell types. We used LME models implemented in lmekin() (Abecasis

et al. 2001) to test the associations between cell types and transcripts. The transcripts associated with cell

types at Bonferroni corrected p<0.05 were used to compare their heritability levels between samples from

3

Page 4: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

the top and bottom cell type proportion levels. The WBC count affected heritability estimates the most.

The average hgeneExp2 was higher in the individuals in the top quartile of WBC (average hgeneExp

2 =0.13) than

those in the bottom quartile (average hgeneExp2 =0.10; the heritability estimates are significantly different

[students’ T-test, p=5.7e-11]). Similarly, the heritability estimates were slightly higher in individuals in

the top quartile of lymphocyte proportions (average hgeneExp2 =0.10) than in those in the bottom quartile

(average hgeneExp2 =0.09) (students’ T-test, p=1.5e-4). The average hgeneExp

2 for individuals in the top vs.

bottom quartiles of neutrophil proportions were not different (average hgeneExp2 =0.10, p=0.6).

Summary of eQTLs results

The eQTL data were generated using gene expression results from 5257 individuals with both genotype

dosage data imputed from 1000-Geonome SNP set and gene expression profiling. In total, we identified

2,206,283 cis and 160,183 trans eQTLs for 12,407 genes at FDR<0.05. Details of eQTLs will be reported

in a separate paper (Joehanes et al. in preparation).

4

Page 5: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Figures

Supplementary Figure S1: Heritability distribution of transcriptom in Framingham cohorts. A) Heritability (h2) distribution of transcripts without SNP-probes (n=6,059, see methods); B) Summary of genes in different h2 sub-categories

5

Page 6: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Figure S2: Heritability distribution of transcripts in Framingham offspring cohort and the third generation cohort.

6

Page 7: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Figure S3: Heritability estimates of transcripts regarding to sample size and family structure. The x-axis indicated the 70 genes, 10 genes in each of seven hgeneExp

2 levels, hgeneExp2 = (0, 0.1),

(0.1, 0.2), (0.2, 0.3), (0.3, 0.4), (0.4, 0.5), (0.5, 0.6), and (0.6, 1). The y-axis is hgeneExp2 .

7

Page 8: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Figure S4: Effect size of cis/trans eQTLs in the discovery and replication sets. Cis eQTLs were defined as the eQTLs located within 1MB before and after the transcriptional start site of eQTL genes, and trans eQTLs indicated that trans eQTLs and eQTL genes were in different chromosome. In this study, we excluded trans eQTLs in the same chromosome with eQTL genes but extended 1MB window.

8

Page 9: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Figure S5: Comparison of t-test Statistics of cis (A) and trans (B) eQTLs at FDR<1e-4 before and after accounting for cell counts effects. We compared the ratios of BETA/SE (t-value) of eQTLs identified by adjusting for cell types to those without adjusting for cell types in regression models. The t-values remained similar with or without adjusting for cell types. Pearson correlations for t values were 0.99 and 0.98 for cis and trans eQTLs respectively.

9

Page 10: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Figure S6: Relationship between Heritability Estimates and proportion of genes with cis/trans eQTLs. A) Proportion of transcripts having cis eQTLs in different heritability levels; B) Proportion of transcripts having trans eQTLs in different heritability levels. (This result is based on eQTLs from 6,059 genes without SNP-probes, see methods)

10

Page 11: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Figure S7: Proportion of Transcript Variance Explained by Peak eQTLs. A) Variance proportion of a transcript explained by a single peak cis eQTL; B) Variance proportion of a transcript explained by a single peak trans eQTL. (This result is based on eQTLs from 6,059 genes without SNP-probes, see methods)

11

Page 12: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

Supplementary Tables

Supplementary Table S1: Technical covariates utilized for gene expression data normalization

Supplementary Table S2: Pedigree structure formed by Framingham Heart Study participants

Supplementary Table S3: Comparison of h2 estiamtes in the FHS and in the Goring et al

Supplementary Table S4: Gene Expression Heritability and Their Peak cis eQTL

Supplementary Table S5: Gene Expression Heritability and Their Peak trans eQTL

Supplementary Table S6: Gene Expression Heritability and Their Peak long-range cis eQTL

Supplementary Table S7: Differentially expressed genes for body mass index at FDR<0.05

12

Page 13: Supplementary Materials - Springer Static Content Server10.1007/s00439-014-1524... · Web viewWord count of the body without references or tables: 5637. Word count of the body including

References

Abecasis GR, Cardon LR, Cookson WO, Sham PC, Cherny SS (2001) Association analysis in a variance components framework. Genet Epidemiol 21 Suppl 1: S341-6

Boardman AE, Hui BS, Wold H (1981) The partial least squares-fix point method of estimating interdependent systems with latent variables. Communications in statistics-theory and methods 10: 613-639

Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, Taylor J, Burnett E, Gut I, Farrall M, Lathrop GM, Abecasis GR, Cookson WO (2007) A genome-wide association study of global gene expression. Nat Genet 39: 1202-7

Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, Gulcher JR, Reitman ML, Kong A, Schadt EE, Stefansson K (2008) Genetics of gene expression and its effect on disease. Nature 452: 423-8

Göring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JB, Abraham LJ, Rainwater DL, Comuzzie AG (2007) Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature genetics 39: 1208-1216

Goring HHH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JBM, Abraham LJ, Rainwater DL, Comuzzie AG, Mahaney MC, Almasy L, MacCluer JW, Kissebah AH, Collier GR, Moses EK, Blangero J (2007) Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature Genetics 39: 1208-1216

Grundberg E, Small KS, Hedman AK, Nica AC, Buil A, Keildson S, Bell JT, Yang TP, Meduri E, Barrett A, Nisbett J, Sekowska M, Wilk A, Shin SY, Glass D, Travers M, Min JL, Ring S, Ho KR, Thorleifsson G, Kong A, Thorsteindottir U, Ainali C, Dimas AS, Hassanali N, Ingle C, Knowles D, Krestyaninova M, Lowe CE, Di Meglio P, Montgomery SB, Parts L, Potter S, Surdulescu G, Tsaprouni L, Tsoka S, Bataille V, Durbin R, Nestle FO, O'Rahilly S, Soranzo N, Lindgren CM, Zondervan KT, Ahmadi KR, Schadt EE, Stefansson K, Smith GD, McCarthy MI, Deloukas P, Dermitzakis ET, Spector TD, R MTHE (2012) Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nature Genetics 44: 1084-+

Price AL, Helgason A, Thorleifsson G, McCarroll SA, Kong A, Stefansson K (2011) Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals. Plos Genetics 7

Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavare S, Deloukas P, Dermitzakis ET (2007) Population genomics of human gene expression. Nature Genetics 39: 1217-1224

Tenesa A, Haley CS (2013) The heritability of human disease: estimation, uses and abuses. Nat Rev Genet 14: 139-49

13