Introduction to Genetic Epidemiology HRM 728 - 2015 Course Coordinator: Dr. Sonia Anand Course...

67
Introduction to Genetic Epidemiology HRM 728 - 2015 Course Coordinator: Dr. Sonia Anand Course Dataset Assistant: Binod

Transcript of Introduction to Genetic Epidemiology HRM 728 - 2015 Course Coordinator: Dr. Sonia Anand Course...

Introduction to Genetic Epidemiology

HRM 728 - 2015

Course Coordinator: Dr. Sonia Anand

Course Dataset Assistant: Binod

Course Outline

• 14 classes

• Mid-Term Assignment: 16-October-2015

• Help Session/Analytical Questions using PLINK – Nov 20, 2015

• Final Exam – Dec 4, 2015

• Final Assignment-Independent Study Presentation - Dec 11, 2015

Student Evaluation

• Class Attendance/Participation: 15%• Mid-Term Assignment: 25% 5 page single

spaced scholary summary (preapproved topic by Dr. Anand)

• Final Exam: 25%

• Independent Study: 35% including class presentation

Seminar 1

• Key Concepts in Genetic Epidemiology– What does genetic epidemiology mean to

you?

Biology

Epidemiology

Statistics

~50 years

‘Finished’ human genome sequence

1900

1944

1953

1960’s

1977

1975-79

1986

1995

1999

1990

Rediscovery of Mendel’s genetics

DNA identified as hereditary material

DNA structure

Genetic code

Advent of DNA sequencing

First human genes isolated

DNA sequencing automated

First whole genome

First human chromosome

Human genome project officially begins

Mendel discovers laws of genetics1865

2003

The Human genome project promised to revolutionise medicine and explain every base of our DNA.

Large MEDICAL GENETICS focus

Identify variation in the genome that is disease causing

Determine how individual genes play a role in health

and disease

The Human genome project

The 2 Human genome project

PUBLIC - Watson/Collins

• Human Genome Project

• Officially launched in 1990

• Worldwide effort - both academic and government institutions

• Assemble the genome using maps

• 1996 Bermuda accord

• 1998 Celera Genomics

• Aim to sequence the human genome in 3 years

• ‘Shotgun’ approach - no use of maps for assembly

• Data release NOT to follow Bermuda principles

PRIVATE - Craig Venter

The Human genome project

It cost 3 billion dollars and took 10 years to complete (5 less than initially predicted).

• Currently 3.2 Gb

• Approx 200 Mb still in progress

– Heterochromatin

– Repetitive

• Most recent human

genome uploaded

February 2009

How Are Traits Transmitted from Parents to Offspring? •Gregor Mendel’s experiments showed that genes are passed from parents of offspring –Each parent carries two genes that control a trait –Each parent contributes one copy from each pair –Pairs of genes separate from each other during the formation of egg and sperm (meiosis) –When egg and sperm fuse during fertilization, genes from mother and father become a new gene pair

Genes are contained on chromosomes –Chromosomes are found in the nucleus of human cells and other higher organisms –Meiosis separates chromosomes pairs during formation of egg and sperm

Concept of Heritability• Proportion of a traits total variance that is attributable

to genetic factors in a particular population• Trait: Quantitative trait or continuous trait – i.e. height• “Attributable to” “caused by”• If everyone in the population were homozygous or

everyone in the population had the same environmental exposure – the factors would not play a role in the “variance” in a trait. Heritability = zero

Hardy-Weinberg Law of Population Genetics

• Assume random mating in a population• In a two allele system, homozygosity and

heterozygosity balance out• Allele and genotype frequencies will remain

the same if:– Organisms reproduce– Allele frequencies are the same in both sexes– Loci must segregate independently– Mating is random with respect to genotype

Hardy-Weinberg Law of Population Genetics

p2 + 2pq + q2 = 1

p + q = 1

Frequency of Alleles in population

Dominant allele Recessive allele

Disease characteristics:

Familial clustering:

Genetic or environmental:

Mode of inheritance:

Disease susceptibility loci:

Disease susceptibility markers:

Descriptive epidemiology

Family aggregation studies

Twin/adoption/half-sibling/migrant studies

Segregation analysis

Linkage analysis

Association studies

GENETIC EPIDEMIOLOGYGENETIC EPIDEMIOLOGYFlow of research

Why do we care about variations?

underlie phenotypic differences

cause inherited diseases

allow tracking ancestral human history

October 2004

Human Genome

• ~30,000 genes

• 3 billion base pairs in the human genome

• 15 million SNPs in human genome

• Human Diversity = 0.5%

• Far less than other animals like the chimp (because humans are younger)

• Patterns of Linkage Disequilibrium (LD) in formative about population histories

SNPs

• SNPs are more common variants (> 5%)• Most mutations will disappear but some will

achieve higher frequencies due either to random genetic drift or to selective pressure

• Base substitution through a non-repaired error that occurs during DNA replication

• Low mutation rate 10-8 substitution per base pair per generation

• Majority of SNPs are inherited - not de novo mutations

SNPs persistence influenced by 2 forces

• 1) Random Genetic Drift – random sampling of different allele with each generation (because only a small fraction of gametes pass onto the next generation); eventually FIXATION occurs when an allele reaches 100% or 0%

• 2) Natural Selection – Affects the probability that a SNP is passed to the next generation - ↑ speed of fixation if it confers a fitness advantage = positive selection or ↓ new deleterious variants from gene pool (negative selection) or results in Balanced selection

Linkage Disequilibrium

• Chromosome are mosaics

• Patterns of LD informative about population histories and depend on:– Recombination rate– Mutation rate– Population Size– Natural selection

Conrad Nature Genetics 2006

Progress in Genetics• 1866 Gregor Mendel suggested traits were inherited• 1869-Friedrich Miescher isolated DNA• 1953 Double Helix Structure of DNA – Watson,

Crick, Rosalind Franklin• 1975- Sanger Sequencing –”1st Generation”• 2003 –Human Genome “Crack the Code”• International Hap Map Project• Automated Sequencing• 1000 Genomes

2nd generation sequencing

Genome wide annotation of functional elements made easy!

Background into 1000 genomes• International collaboration

• Sequence whole genome of approximately 2000 individuals from ~ 20 populations

• Central goal is to describe most of the genetic variation that occurs at a population frequency greater than 1%

• Help scientists:• Identify genetic variation with high resolution• Improved imputation• Novel genotype-phenotype associations• Causal variants• More accurately study evolutionary process & racial

differencesThe 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes Nature DOI: 10.1038/nature11632

Population-specific genetic variation at high resolutionObserve and identify population-specific

genetic variation

Novel SNPs are rare and more likely to be observed in one ethnic group Need good coverage in multiple populations Identification of such variants can help develop

new population-specific arrays, minimizing ascertainment bias that currently exists as most are derived from Europeans

Imputation to GWASProvide resource to aid imputation of missing

genotypes in association studies

From the pilot study, authors found that each signal was in LD with 56 variants, on average 19% of time a coding variant was present in this

LD

Shows that 1000 genomes can be used to find variants that could be functional corresponding to GWAS hits

Identification of causal variantsPrecise causative genes are difficult to identify

as GWAS focus on LD / genomic regions

Deep sequencing studies can help find novel or rare functional variants

Re-sequencing studies support this approach in uncovering rarer variants with larger effects and functional causes with disease (Nejentsev 2009)

From the Pilot phaseDescribes

genomes from 1,092 individuals representing 14 populations across Europe, Africa, Asia, and the Americas

1000 GenomesThe fraction of variants

identified across the project that are found in only one population (white line), are restricted to a single ancestry-based group (solid colour), are found in all groups (solid black line) and all populations (dotted black line)

1000 GenomesMost common variants

were almost always present in all 14 populations

Degree of rare variants differed greatly

From Genetics to Genomics

• Disease

• Single Gene Disorders

• Mutations/One Gene

• High Disease Risk

• Environment Role +/-

• “Genetic Services”

• Information

• All Diseases

• Variation/Multi Genes

• Low Disease Risk

• Environment Role ++

• Gene-Environment Inxs

Genetics Genomics

Common Complex Diseases

• Condition such as CVD is common • Includes closely related but not identical

manifestations – angina, unstable angina, MI• Multiple genes have small effects - RR of 1.2 to

1.5 – affect multiple “risk factors” or intermediate phenotypes

• Causative genotype may be the more common genotype (unlike monogenic disorders)

What are we trying to study?

"It's a classic scientific paradox — we know a genotype and we know a phenotype, but there's a black box in

between"

SNP Variation Disease

GeneExpression

Protein Synthesis

Post TranslationalChanges

Protein Expression

Genetic Association Studies

Other Risk factors

SNP Variation

Disease GeneExpression

Protein Synthesis

Post TranslationalChanges

Protein Expression

Genetic Association Studies

Other Risk factors

Environmental Exposure

Indirect and Direct Allelic Association

D

*

Measure disease relevance (*) directly, ignoring correlated markers nearby

Semantic distinction between Linkage Disequilibrium: correlation between (any) markers in populationAllelic Association: correlation between marker allele and trait

Direct Association

M1 M2 Mn

Assess trait effects on D via correlated markers (Mi) rather than susceptibility/etiologic variants.

D

Indirect Association & LD

Marchini, 2004 (www)

Population Stratification

Hunter, 2005 (www)

Models of gene–environment interactions

Hunter, 2005 (www)

Sample size requirement for gene-environment interaction studies

Hunter, 2005 (www)

An example of a gene-environment interaction

In Alzheimer disease, the risk of cognitive decline as measured by TICS test is particularly high in APOE4 carriers who have untreated hypertension

(APOE4+/HT+).

Ascertainment Bias

• Case-control type studies are specifically prone to ascertainment bias in this scenario as unlike a population-based study, cases and controls can be enriched for factors which investigators would like to focus, in the case of diabetes, hyperglycemia

• In case of TCF7L2 (rs7903146) it could appear that in control samples the T-allele is associated with lower BMI, this is because, although the T-allele causes hyperglycaemia, the controls are selected to be normoglycaemic leading to accumulation of T-allele carriers with higher physical activity levels or lower BMI

Future Directions: Beyond DNA & RNAFuture Directions: Beyond DNA & RNA

*adapted from Ginsburg G, et al. J Am Coll Cardiol. 2005;46:1615-1627.*adapted from Ginsburg G, et al. J Am Coll Cardiol. 2005;46:1615-1627.

“Omic” approach Technology

Number estimated in

humans

GenomicsSingle nucleotide polymorphisms (SNPs)

~10,000,000

TranscriptomicsMicroarrays of gene transcripts (RNA)

~20,000

ProteomicsProtein arrays of specific protein products

~100,000

Metabolomics Metabolic profiles1000 – 10,000 metabolites

Height and Risk of Coronary Artery Disease

Height and Risk of Coronary Artery Disease

Paper by Gertler et al. from 1951 reported that individuals who suffered from a myocardial infarction before the age of 40 were

on average 5 cm (2.9%) shorter than a healthy control population

Paper by Gertler et al. from 1951 reported that individuals who suffered from a myocardial infarction before the age of 40 were

on average 5 cm (2.9%) shorter than a healthy control population

Gertler MM, Garn SM, White PD

The Journal of the American Medical Association 1951

Gertler MM, Garn SM, White PD

The Journal of the American Medical Association 1951

Short stature is associated with coronary heart disease: a

systematic review of the literature and a meta-analysis.

Short stature is associated with coronary heart disease: a

systematic review of the literature and a meta-analysis.

Paajanen TA, Oksala NKJ, Kuukasjärvi P, Karhunen PJ

European Heart Journal 2010

Paajanen TA, Oksala NKJ, Kuukasjärvi P, Karhunen PJ

European Heart Journal 2010

MethodsMethods

• Selection of studies for review: Systematic reviews, meta-analyses, randomized clinical

trials, clinical trials, and cohort or case-control studies with at least 200 subjects

Height dichotomized into short and tall groups Outcome defined as diagnosis of angina pectoris,

ischaemic heart disease (IHD) or heart disease without MI, acute MI, or history of MI, coronary artery occlusion equal to or more than 50%, revascularization or percutaneous transluminal coronary angioplasty (PTCA), as well as all-cause mortality, CVD mortality, or CHD mortality

• Meta-analysis: I-squared test for heterogeneity of data ORs and RRs from all studies converted to RRs for

shorter group

• Selection of studies for review: Systematic reviews, meta-analyses, randomized clinical

trials, clinical trials, and cohort or case-control studies with at least 200 subjects

Height dichotomized into short and tall groups Outcome defined as diagnosis of angina pectoris,

ischaemic heart disease (IHD) or heart disease without MI, acute MI, or history of MI, coronary artery occlusion equal to or more than 50%, revascularization or percutaneous transluminal coronary angioplasty (PTCA), as well as all-cause mortality, CVD mortality, or CHD mortality

• Meta-analysis: I-squared test for heterogeneity of data ORs and RRs from all studies converted to RRs for

shorter group

Results Results

• Average cut-off for shorter group was 160.5 cm and cut-off for taller group was 173.9 cm, with different ranges for men and women

• Combined RR for shorter group to experience CHD was 1.46 (95% CI 1.37–1.55)

• Combined RR for all-cause mortality for short men was 1.37 (1.29–1.46) and for short women 1.55 (1.41–1.70)

• Combined RR for all types of cardiovascular (CVD) deaths among men and women was 1.55 (95% CI 1.37–1.74)

• Overall, short stature represents ~1.5 times increased risk of CHD morbidity and mortality compared against tall stature

• Average cut-off for shorter group was 160.5 cm and cut-off for taller group was 173.9 cm, with different ranges for men and women

• Combined RR for shorter group to experience CHD was 1.46 (95% CI 1.37–1.55)

• Combined RR for all-cause mortality for short men was 1.37 (1.29–1.46) and for short women 1.55 (1.41–1.70)

• Combined RR for all types of cardiovascular (CVD) deaths among men and women was 1.55 (95% CI 1.37–1.74)

• Overall, short stature represents ~1.5 times increased risk of CHD morbidity and mortality compared against tall stature

New Approach to crack the questionNew Approach to crack the question

Using a genetic approach to explore the association between height and CAD risk helps remove some of the lifestyle and environmental confounders present in epidemiological studies

Using a genetic approach to explore the association between height and CAD risk helps remove some of the lifestyle and environmental confounders present in epidemiological studies

• Background: 180 single-nucleotide polymorphisms (SNPs)

were found to be significantly associated with height (GIANT study in Europeans, n=183,727)

• Aims: Assess combined effect of 180 height-

associated SNPs on CAD risk Assess effect of these SNPs on CAD risk factors

(e.g. blood pressure, LDL, etc.) Identify any biological pathways mediating this

association

• Background: 180 single-nucleotide polymorphisms (SNPs)

were found to be significantly associated with height (GIANT study in Europeans, n=183,727)

• Aims: Assess combined effect of 180 height-

associated SNPs on CAD risk Assess effect of these SNPs on CAD risk factors

(e.g. blood pressure, LDL, etc.) Identify any biological pathways mediating this

association

Nelson NEJM 2015

Study PopulationStudy Population

•Summary association statistics extracted from 3 meta-analyses of GWAS case-control studies of CAD:

•Coronary Artery Disease Genomewide Replication and Meta-Analysis (CARDIoGRAM) Consortium

21977 cases, 62289 controls All 180 SNP variants

•Coronary Artery Disease (C4D) Consortium 17766 cases, 17115 controls All 180 SNP variants

•Metabochip Combined CARDIoGRAM+C4D Consortium for cohorts

not included in previous meta-analyses 25323 cases, 48979 controls 112 SNP variants

•Summary association statistics extracted from 3 meta-analyses of GWAS case-control studies of CAD:

•Coronary Artery Disease Genomewide Replication and Meta-Analysis (CARDIoGRAM) Consortium

21977 cases, 62289 controls All 180 SNP variants

•Coronary Artery Disease (C4D) Consortium 17766 cases, 17115 controls All 180 SNP variants

•Metabochip Combined CARDIoGRAM+C4D Consortium for cohorts

not included in previous meta-analyses 25323 cases, 48979 controls 112 SNP variants

Nelson NEJM 2015

Advantages of genetic approach in this study over traditional epidemiologic approach:- Genetic determinants of height are not confounded by

lifestyle (e.g. nutrition) or environmental (e.g. socioeconomic status) factors

- Allows tracing of genetic pathways to identify potential mechanisms driving association

Limitations:- Lifestyle and environmental choices/events can be a direct

consequence of height

Height-Associated Variants and CAD - Methods

Height-Associated Variants and CAD - Methods

• Using: β1 = effect size of association between variant

and height (GIANT study) β2 = effect size of association between variant

and CAD (CARDIoGRAM, C4D, and Metabochip studies)

• To calculate: β3 = effect size of association between height

and CAD mediated through variant β3 is the odds ratio for CAD per 1-standard

deviation increase in genetically determined height

• Using: β1 = effect size of association between variant

and height (GIANT study) β2 = effect size of association between variant

and CAD (CARDIoGRAM, C4D, and Metabochip studies)

• To calculate: β3 = effect size of association between height

and CAD mediated through variant β3 is the odds ratio for CAD per 1-standard

deviation increase in genetically determined height

OR for CAD per OR for CAD per 1 SD increase in 1 SD increase in genetically genetically determined determined heightheight

OR for CAD per OR for CAD per 1 SD increase in 1 SD increase in genetically genetically determined determined heightheight

Height-Associated Variants and CAD - Methods

Height-Associated Variants and CAD - Methods

• Association between individual SNPs with height (β1) and between individual SNPs with CAD (β2) is very small

• Thus, β3 values for individual SNPs are centered around 1.0 and generally insignificant

• To determine complete association between height and CAD, we combined β3 values from all SNPs using inverse-variance—weighted random-effects meta-analysis

• Association between individual SNPs with height (β1) and between individual SNPs with CAD (β2) is very small

• Thus, β3 values for individual SNPs are centered around 1.0 and generally insignificant

• To determine complete association between height and CAD, we combined β3 values from all SNPs using inverse-variance—weighted random-effects meta-analysis

Height-Associated Variants and CAD - Results Height-Associated Variants and CAD - Results

• Combined association between height-associated SNPs and CAD was significant (OR=0.88, 95% CI = 0.82 to 0.95, p<0.001)

• 13.5% increase in CAD risk per 1-standard deviation (SD) decrease in height

• Most individual β3 values centered around 1.0 and insignificant, but a few values were significant (p<0.05) 3 out of 180 SNPs remained significant after

Bonferroni correction

• Combined association between height-associated SNPs and CAD was significant (OR=0.88, 95% CI = 0.82 to 0.95, p<0.001)

• 13.5% increase in CAD risk per 1-standard deviation (SD) decrease in height

• Most individual β3 values centered around 1.0 and insignificant, but a few values were significant (p<0.05) 3 out of 180 SNPs remained significant after

Bonferroni correction

Genetic Risk Score Analysis - MethodsGenetic Risk Score Analysis - Methods

• Subgroup of CAD cohorts had genomewide individual-level genotype data available (8240 cases, 10009 controls)

• Weighted analysis of genetic risk scores to evaluate effect of increasing number of height-associated variants on CAD risk

• Genetic risk score: Value from 0 to 2 for each SNP obtained by

multiplying sum of posterior probabilities for height-increasing allele with effect size of allele on height

Values totalled across all SNPs for each individual Individuals ranked and divided into quartiles Logistic regression on quartiles to estimate

combined odds ratio for CAD

• Subgroup of CAD cohorts had genomewide individual-level genotype data available (8240 cases, 10009 controls)

• Weighted analysis of genetic risk scores to evaluate effect of increasing number of height-associated variants on CAD risk

• Genetic risk score: Value from 0 to 2 for each SNP obtained by

multiplying sum of posterior probabilities for height-increasing allele with effect size of allele on height

Values totalled across all SNPs for each individual Individuals ranked and divided into quartiles Logistic regression on quartiles to estimate

combined odds ratio for CAD

Genetic Risk Score Analysis - ResultsGenetic Risk Score Analysis - Results

• Increased number of height-raising alleles associated with reduced risk of CAD

• Odds ratios for each quartile: Quartile 2 vs. Quartile 1 = 0.90 (95% CI = 0.83 to

0.98, p=0.02) Quartile 3 vs Quartile 1 = 0.88 (95% CI = 0.81 to

0.96, p=0.003) Quartile 4 vs Quartile 1 = 0.74 (95% CI = 0.68 to

0.80, p<0.001)

• Quartile 4 includes individuals with highest number of height-raising alleles, Quartile 3 has individuals with second most, etc.

• Increased number of height-raising alleles associated with reduced risk of CAD

• Odds ratios for each quartile: Quartile 2 vs. Quartile 1 = 0.90 (95% CI = 0.83 to

0.98, p=0.02) Quartile 3 vs Quartile 1 = 0.88 (95% CI = 0.81 to

0.96, p=0.003) Quartile 4 vs Quartile 1 = 0.74 (95% CI = 0.68 to

0.80, p<0.001)

• Quartile 4 includes individuals with highest number of height-raising alleles, Quartile 3 has individuals with second most, etc.

What if SNPs for Height are also associated with CAD risk factors? and CAD Risk Factors

What if SNPs for Height are also associated with CAD risk factors? and CAD Risk Factors

• Obtained estimates of effect sizes for 180 height variants on CAD risk factors based on meta-analyses for genomewide association studies: Systolic blood pressure (n=69899) Diastolic blood pressure (n=69909) Mean arterial pressure (n=29182) Pulse pressure (n=74079) LDL cholesterol level (n=95454) HDL cholesterol level (n=99900) Triglyceride level (n=96598) Type 2 diabetes (34840 cases, 114981 controls) Glucose (n=96496) Log-transformed plasma insulin (n=85573) Smoking quantity (n=41150)

• β3 values calculated for association of height with CAD risk factors (similar to how they were calculated for overall CAD risk)

• Obtained estimates of effect sizes for 180 height variants on CAD risk factors based on meta-analyses for genomewide association studies: Systolic blood pressure (n=69899) Diastolic blood pressure (n=69909) Mean arterial pressure (n=29182) Pulse pressure (n=74079) LDL cholesterol level (n=95454) HDL cholesterol level (n=99900) Triglyceride level (n=96598) Type 2 diabetes (34840 cases, 114981 controls) Glucose (n=96496) Log-transformed plasma insulin (n=85573) Smoking quantity (n=41150)

• β3 values calculated for association of height with CAD risk factors (similar to how they were calculated for overall CAD risk)

Height-Associated Variants and CAD Risk Factors

Height-Associated Variants and CAD Risk Factors

• β3 values represent change in measurement unit of variable per 1-standard deviation change in height

• Only LDL cholesterol level (OR= -0.06, 95% CI = -0.09 to -0.04, p<0.001) and triglyceride level (OR= -0.05, 95% CI = -0.08 to -0.03, p<0.001) had significant associations with height-associated SNPs

• 19% of association between genetically determined height and CAD explained by effect of height on LDL cholesterol

• 12% of association between genetically determined height and CAD explained by effect of height on triglyceride level

• β3 values represent change in measurement unit of variable per 1-standard deviation change in height

• Only LDL cholesterol level (OR= -0.06, 95% CI = -0.09 to -0.04, p<0.001) and triglyceride level (OR= -0.05, 95% CI = -0.08 to -0.03, p<0.001) had significant associations with height-associated SNPs

• 19% of association between genetically determined height and CAD explained by effect of height on LDL cholesterol

• 12% of association between genetically determined height and CAD explained by effect of height on triglyceride level

ConclusionsConclusions

• Association between genetically determined decrease in height (sum of 180 height-associated SNPs) and increased risk of CAD (13.5% increase in CAD risk per 1-SD decrease in height) 2.3 % of this association explained by effect of height on LDL

levels (inverse relationship) 1.9% of this association explained by effect of height on

triglyceride levels (inverse relationship)

• Genetically determined height was associated with CAD risk in men but not in women, in contrast with findings from epidemiological studies suggesting an association in both genders

• Height-associated SNPs were not significantly associated with BMI, suggesting pathway independent of obesity

• Association between genetically determined decrease in height (sum of 180 height-associated SNPs) and increased risk of CAD (13.5% increase in CAD risk per 1-SD decrease in height) 2.3 % of this association explained by effect of height on LDL

levels (inverse relationship) 1.9% of this association explained by effect of height on

triglyceride levels (inverse relationship)

• Genetically determined height was associated with CAD risk in men but not in women, in contrast with findings from epidemiological studies suggesting an association in both genders

• Height-associated SNPs were not significantly associated with BMI, suggesting pathway independent of obesity