SHI Meng

22
SHI Meng

description

SHI Meng. Abstract. - PowerPoint PPT Presentation

Transcript of SHI Meng

Page 1: SHI Meng

SHI Meng

Page 2: SHI Meng

Abstract

• The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations.

Page 3: SHI Meng

Abstract

• We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.

Page 4: SHI Meng

Background

• Human population differentiation– Neutral DNA sequence– functional variants

• non-synonymous variants• eQTLs

• Previous eQTLs studies– limited to only several well-defined populations– have not contrasted geographically proximate populations

• first analysis of eQTL differentiation among eight human population samples

Page 5: SHI Meng

Materials

• LCLs (lymphoblastoid cell lines)• Samples:

– 726 individuals of 8 HapMap populations

• Expression data:– Sentrix Human-6 Expression BeadChip version 2– 47,294 transcripts, plus controls– 21,800 probes: 18,226 unique autosomal Ensembl genes

• Genotype data:– MAF > 0.05, < 20% missing data– 1.1 million ~ 1.3 million per population

CEU CHB GIH JPT LWK MEX MKK YRI

109 80 82 82 82 45 138 108

Page 6: SHI Meng

Methods

• Raw expression data normalization– log2 scale– quantile normalization across replicates of a single individual– mean normaliztion across all individuals of the eight populations

• Population stratification correction of expression data– Admixed populations: GIH, LWK, MEX, MKK– EIGENSTRAT: princinple components based on genotype– Expression values were adjusted for each population using ten

primary axes of variation form corresponding intra-population PCA

Page 7: SHI Meng

Methods

• Correction for known and unknown factors: ‘‘REDUCED’’ dataset generation– probabilistic estimation of expression residuals (PEER) fra

mework

• Structure of gene expression variation among populations– Vst: (VT - VS)/VT; VS =(V1*n1+V2*n2)/(n1+n2)– top 5% probes: GO term enrichments

Page 8: SHI Meng

Methods

• Association and multiple-test correction (individual populations)– cis: <= 1Mb from TSS– Association: Spearman Rank Correlation (SRC) model– significance accessment

• 10,000 permutations of each phenotype (probe) relative to the genotypes

• threshold: 0.01– FDR:

• 1 - (the number of genes with replication/total number of significant genes)

Page 9: SHI Meng

Methods

• Stepwise association model– determine whether independent cis- regulatory signals exi

st for a given gene– Steps:

• regressed out of the expression levels the effect of the most-significant SNP

• re-ran the SRC analysis• stored those SNPs with p-values more significant than the gene’s p

ermutation threshold• repeated until there were no SNPs from the initial significant eQTL

list left to test

Page 10: SHI Meng

Results

• Structure of gene expression variation among populations– expression- based PCA plot: not separate distinctly– Vst:

• Vst values: heavily skewed toward values near 0• the amount of VST between a pair of populations is correlated with the de

gree of genetic distance• the vast majority of genes do not exhibit highly differentiated expression

variation between populations– probes exhbiting top 5% Vst scores: enriched in GO terms

• significant population-specific GO term enrichment• GO terms corresponding to genes significantly diverged in expression in o

ne population are also diverged in expression in the other, closely-related populations

Page 11: SHI Meng
Page 12: SHI Meng

Results

• Cis associations of gene expression with SNPs

Page 13: SHI Meng

Results

• Multiple effects underlying cis-eQTLs– at least two significant cis-eQTL SNPs at the 0.01 p

ermutation threshold– a total of 33 (0~2% for 8 populations) genes with

multiple eQTLs– At most, a single gene had five independently asso

ciated SNPs

Page 14: SHI Meng

Results

• Population sharing of cis-eQTLs– 1,074 (34%) of 3130 genes had a significant cis-eQTL in at l

east two populations– more closely-related populations tend to share more cis- a

ssociated genes than more distantly-related populations– 98.9–100% concordance of allelic direction– effect size (fold difference between homozygotes of the

two different genotypic states of a SNP) is shared between any two populations when the association is also shared

– the discovery of an eQTL mainly due to allele frequency differences, not due to differences in absolute effect size

Page 15: SHI Meng
Page 16: SHI Meng
Page 17: SHI Meng

Results

• Genomic properties of eQTLS– majority of association signals are approximately symmetri

cally centered on the TSS– the strongest statistical signals located directly at the TSS– population sharing increases from in only one population t

o all eight populations,s gradual tightening of the distribution around the TSS

– SNPs associated with more than one gene• 264 genes• 52 clusters of 2 or more genes in at least two populations• the distance to TSS: larger

Page 18: SHI Meng
Page 19: SHI Meng
Page 20: SHI Meng

Results

• eQTLs and disease– 62 SNPs from GWAS catalog• the most-significant SNP of a cis-eQTL in at least one

population• 57 Ensembl genes, and 51 traits

– Alcohol dependence, Crohn’s disease, ...

• 15 (24%) were the most significant SNP of the same gene in at least one additional population

– assist in fine-mapping causal variants for complex traits

Page 21: SHI Meng

Disscussion

• extensive sharing of eQTLs across human populations• effect size and the direction of effect for eQTLs: highly conserved• symmetric distribution of eQTLs around the TSS

• additional cell types under a variety different cellular and developmental conditions

• how the frequency spectrum of regulatory variants has been shaped by selective and demographic processes

• how these functional variants contribute to higher order phenotypes

• methods to preprocess microarray data and detect eQTLs• comprehensive analysis of eQTLs and functional association

Page 22: SHI Meng

Thank you!