Introduction to Genetic Epidemiology HRM 728 - 2015 Course Coordinator: Dr. Sonia Anand Course...
-
Upload
martha-barrett -
Category
Documents
-
view
223 -
download
7
Transcript of Introduction to Genetic Epidemiology HRM 728 - 2015 Course Coordinator: Dr. Sonia Anand Course...
Introduction to Genetic Epidemiology
HRM 728 - 2015
Course Coordinator: Dr. Sonia Anand
Course Dataset Assistant: Binod
Course Outline
• 14 classes
• Mid-Term Assignment: 16-October-2015
• Help Session/Analytical Questions using PLINK – Nov 20, 2015
• Final Exam – Dec 4, 2015
• Final Assignment-Independent Study Presentation - Dec 11, 2015
Student Evaluation
• Class Attendance/Participation: 15%• Mid-Term Assignment: 25% 5 page single
spaced scholary summary (preapproved topic by Dr. Anand)
• Final Exam: 25%
• Independent Study: 35% including class presentation
Seminar 1
• Key Concepts in Genetic Epidemiology– What does genetic epidemiology mean to
you?
Biology
Epidemiology
Statistics
~50 years
‘Finished’ human genome sequence
1900
1944
1953
1960’s
1977
1975-79
1986
1995
1999
1990
Rediscovery of Mendel’s genetics
DNA identified as hereditary material
DNA structure
Genetic code
Advent of DNA sequencing
First human genes isolated
DNA sequencing automated
First whole genome
First human chromosome
Human genome project officially begins
Mendel discovers laws of genetics1865
2003
The Human genome project promised to revolutionise medicine and explain every base of our DNA.
Large MEDICAL GENETICS focus
Identify variation in the genome that is disease causing
Determine how individual genes play a role in health
and disease
The Human genome project
The 2 Human genome project
PUBLIC - Watson/Collins
• Human Genome Project
• Officially launched in 1990
• Worldwide effort - both academic and government institutions
• Assemble the genome using maps
• 1996 Bermuda accord
• 1998 Celera Genomics
• Aim to sequence the human genome in 3 years
• ‘Shotgun’ approach - no use of maps for assembly
• Data release NOT to follow Bermuda principles
PRIVATE - Craig Venter
The Human genome project
It cost 3 billion dollars and took 10 years to complete (5 less than initially predicted).
• Currently 3.2 Gb
• Approx 200 Mb still in progress
– Heterochromatin
– Repetitive
• Most recent human
genome uploaded
February 2009
How Are Traits Transmitted from Parents to Offspring? •Gregor Mendel’s experiments showed that genes are passed from parents of offspring –Each parent carries two genes that control a trait –Each parent contributes one copy from each pair –Pairs of genes separate from each other during the formation of egg and sperm (meiosis) –When egg and sperm fuse during fertilization, genes from mother and father become a new gene pair
Genes are contained on chromosomes –Chromosomes are found in the nucleus of human cells and other higher organisms –Meiosis separates chromosomes pairs during formation of egg and sperm
Concept of Heritability• Proportion of a traits total variance that is attributable
to genetic factors in a particular population• Trait: Quantitative trait or continuous trait – i.e. height• “Attributable to” “caused by”• If everyone in the population were homozygous or
everyone in the population had the same environmental exposure – the factors would not play a role in the “variance” in a trait. Heritability = zero
Hardy-Weinberg Law of Population Genetics
• Assume random mating in a population• In a two allele system, homozygosity and
heterozygosity balance out• Allele and genotype frequencies will remain
the same if:– Organisms reproduce– Allele frequencies are the same in both sexes– Loci must segregate independently– Mating is random with respect to genotype
Hardy-Weinberg Law of Population Genetics
p2 + 2pq + q2 = 1
p + q = 1
Frequency of Alleles in population
Dominant allele Recessive allele
Disease characteristics:
Familial clustering:
Genetic or environmental:
Mode of inheritance:
Disease susceptibility loci:
Disease susceptibility markers:
Descriptive epidemiology
Family aggregation studies
Twin/adoption/half-sibling/migrant studies
Segregation analysis
Linkage analysis
Association studies
GENETIC EPIDEMIOLOGYGENETIC EPIDEMIOLOGYFlow of research
Why do we care about variations?
underlie phenotypic differences
cause inherited diseases
allow tracking ancestral human history
October 2004
Human Genome
• ~30,000 genes
• 3 billion base pairs in the human genome
• 15 million SNPs in human genome
• Human Diversity = 0.5%
• Far less than other animals like the chimp (because humans are younger)
• Patterns of Linkage Disequilibrium (LD) in formative about population histories
SNPs
• SNPs are more common variants (> 5%)• Most mutations will disappear but some will
achieve higher frequencies due either to random genetic drift or to selective pressure
• Base substitution through a non-repaired error that occurs during DNA replication
• Low mutation rate 10-8 substitution per base pair per generation
• Majority of SNPs are inherited - not de novo mutations
SNPs persistence influenced by 2 forces
• 1) Random Genetic Drift – random sampling of different allele with each generation (because only a small fraction of gametes pass onto the next generation); eventually FIXATION occurs when an allele reaches 100% or 0%
• 2) Natural Selection – Affects the probability that a SNP is passed to the next generation - ↑ speed of fixation if it confers a fitness advantage = positive selection or ↓ new deleterious variants from gene pool (negative selection) or results in Balanced selection
Linkage Disequilibrium
• Chromosome are mosaics
• Patterns of LD informative about population histories and depend on:– Recombination rate– Mutation rate– Population Size– Natural selection
Conrad Nature Genetics 2006
Progress in Genetics• 1866 Gregor Mendel suggested traits were inherited• 1869-Friedrich Miescher isolated DNA• 1953 Double Helix Structure of DNA – Watson,
Crick, Rosalind Franklin• 1975- Sanger Sequencing –”1st Generation”• 2003 –Human Genome “Crack the Code”• International Hap Map Project• Automated Sequencing• 1000 Genomes
Background into 1000 genomes• International collaboration
• Sequence whole genome of approximately 2000 individuals from ~ 20 populations
• Central goal is to describe most of the genetic variation that occurs at a population frequency greater than 1%
• Help scientists:• Identify genetic variation with high resolution• Improved imputation• Novel genotype-phenotype associations• Causal variants• More accurately study evolutionary process & racial
differencesThe 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes Nature DOI: 10.1038/nature11632
Population-specific genetic variation at high resolutionObserve and identify population-specific
genetic variation
Novel SNPs are rare and more likely to be observed in one ethnic group Need good coverage in multiple populations Identification of such variants can help develop
new population-specific arrays, minimizing ascertainment bias that currently exists as most are derived from Europeans
Imputation to GWASProvide resource to aid imputation of missing
genotypes in association studies
From the pilot study, authors found that each signal was in LD with 56 variants, on average 19% of time a coding variant was present in this
LD
Shows that 1000 genomes can be used to find variants that could be functional corresponding to GWAS hits
Identification of causal variantsPrecise causative genes are difficult to identify
as GWAS focus on LD / genomic regions
Deep sequencing studies can help find novel or rare functional variants
Re-sequencing studies support this approach in uncovering rarer variants with larger effects and functional causes with disease (Nejentsev 2009)
From the Pilot phaseDescribes
genomes from 1,092 individuals representing 14 populations across Europe, Africa, Asia, and the Americas
1000 GenomesThe fraction of variants
identified across the project that are found in only one population (white line), are restricted to a single ancestry-based group (solid colour), are found in all groups (solid black line) and all populations (dotted black line)
1000 GenomesMost common variants
were almost always present in all 14 populations
Degree of rare variants differed greatly
From Genetics to Genomics
• Disease
• Single Gene Disorders
• Mutations/One Gene
• High Disease Risk
• Environment Role +/-
• “Genetic Services”
• Information
• All Diseases
• Variation/Multi Genes
• Low Disease Risk
• Environment Role ++
• Gene-Environment Inxs
Genetics Genomics
Common Complex Diseases
• Condition such as CVD is common • Includes closely related but not identical
manifestations – angina, unstable angina, MI• Multiple genes have small effects - RR of 1.2 to
1.5 – affect multiple “risk factors” or intermediate phenotypes
• Causative genotype may be the more common genotype (unlike monogenic disorders)
What are we trying to study?
"It's a classic scientific paradox — we know a genotype and we know a phenotype, but there's a black box in
between"
SNP Variation Disease
GeneExpression
Protein Synthesis
Post TranslationalChanges
Protein Expression
Genetic Association Studies
Other Risk factors
SNP Variation
Disease GeneExpression
Protein Synthesis
Post TranslationalChanges
Protein Expression
Genetic Association Studies
Other Risk factors
Environmental Exposure
Indirect and Direct Allelic Association
D
*
Measure disease relevance (*) directly, ignoring correlated markers nearby
Semantic distinction between Linkage Disequilibrium: correlation between (any) markers in populationAllelic Association: correlation between marker allele and trait
Direct Association
M1 M2 Mn
Assess trait effects on D via correlated markers (Mi) rather than susceptibility/etiologic variants.
D
Indirect Association & LD
Marchini, 2004 (www)
Population Stratification
Hunter, 2005 (www)
Models of gene–environment interactions
Hunter, 2005 (www)
Sample size requirement for gene-environment interaction studies
Hunter, 2005 (www)
An example of a gene-environment interaction
In Alzheimer disease, the risk of cognitive decline as measured by TICS test is particularly high in APOE4 carriers who have untreated hypertension
(APOE4+/HT+).
Ascertainment Bias
• Case-control type studies are specifically prone to ascertainment bias in this scenario as unlike a population-based study, cases and controls can be enriched for factors which investigators would like to focus, in the case of diabetes, hyperglycemia
• In case of TCF7L2 (rs7903146) it could appear that in control samples the T-allele is associated with lower BMI, this is because, although the T-allele causes hyperglycaemia, the controls are selected to be normoglycaemic leading to accumulation of T-allele carriers with higher physical activity levels or lower BMI
Future Directions: Beyond DNA & RNAFuture Directions: Beyond DNA & RNA
*adapted from Ginsburg G, et al. J Am Coll Cardiol. 2005;46:1615-1627.*adapted from Ginsburg G, et al. J Am Coll Cardiol. 2005;46:1615-1627.
“Omic” approach Technology
Number estimated in
humans
GenomicsSingle nucleotide polymorphisms (SNPs)
~10,000,000
TranscriptomicsMicroarrays of gene transcripts (RNA)
~20,000
ProteomicsProtein arrays of specific protein products
~100,000
Metabolomics Metabolic profiles1000 – 10,000 metabolites
Paper by Gertler et al. from 1951 reported that individuals who suffered from a myocardial infarction before the age of 40 were
on average 5 cm (2.9%) shorter than a healthy control population
Paper by Gertler et al. from 1951 reported that individuals who suffered from a myocardial infarction before the age of 40 were
on average 5 cm (2.9%) shorter than a healthy control population
Gertler MM, Garn SM, White PD
The Journal of the American Medical Association 1951
Gertler MM, Garn SM, White PD
The Journal of the American Medical Association 1951
Short stature is associated with coronary heart disease: a
systematic review of the literature and a meta-analysis.
Short stature is associated with coronary heart disease: a
systematic review of the literature and a meta-analysis.
Paajanen TA, Oksala NKJ, Kuukasjärvi P, Karhunen PJ
European Heart Journal 2010
Paajanen TA, Oksala NKJ, Kuukasjärvi P, Karhunen PJ
European Heart Journal 2010
MethodsMethods
• Selection of studies for review: Systematic reviews, meta-analyses, randomized clinical
trials, clinical trials, and cohort or case-control studies with at least 200 subjects
Height dichotomized into short and tall groups Outcome defined as diagnosis of angina pectoris,
ischaemic heart disease (IHD) or heart disease without MI, acute MI, or history of MI, coronary artery occlusion equal to or more than 50%, revascularization or percutaneous transluminal coronary angioplasty (PTCA), as well as all-cause mortality, CVD mortality, or CHD mortality
• Meta-analysis: I-squared test for heterogeneity of data ORs and RRs from all studies converted to RRs for
shorter group
• Selection of studies for review: Systematic reviews, meta-analyses, randomized clinical
trials, clinical trials, and cohort or case-control studies with at least 200 subjects
Height dichotomized into short and tall groups Outcome defined as diagnosis of angina pectoris,
ischaemic heart disease (IHD) or heart disease without MI, acute MI, or history of MI, coronary artery occlusion equal to or more than 50%, revascularization or percutaneous transluminal coronary angioplasty (PTCA), as well as all-cause mortality, CVD mortality, or CHD mortality
• Meta-analysis: I-squared test for heterogeneity of data ORs and RRs from all studies converted to RRs for
shorter group
Results Results
• Average cut-off for shorter group was 160.5 cm and cut-off for taller group was 173.9 cm, with different ranges for men and women
• Combined RR for shorter group to experience CHD was 1.46 (95% CI 1.37–1.55)
• Combined RR for all-cause mortality for short men was 1.37 (1.29–1.46) and for short women 1.55 (1.41–1.70)
• Combined RR for all types of cardiovascular (CVD) deaths among men and women was 1.55 (95% CI 1.37–1.74)
• Overall, short stature represents ~1.5 times increased risk of CHD morbidity and mortality compared against tall stature
• Average cut-off for shorter group was 160.5 cm and cut-off for taller group was 173.9 cm, with different ranges for men and women
• Combined RR for shorter group to experience CHD was 1.46 (95% CI 1.37–1.55)
• Combined RR for all-cause mortality for short men was 1.37 (1.29–1.46) and for short women 1.55 (1.41–1.70)
• Combined RR for all types of cardiovascular (CVD) deaths among men and women was 1.55 (95% CI 1.37–1.74)
• Overall, short stature represents ~1.5 times increased risk of CHD morbidity and mortality compared against tall stature
New Approach to crack the questionNew Approach to crack the question
Using a genetic approach to explore the association between height and CAD risk helps remove some of the lifestyle and environmental confounders present in epidemiological studies
Using a genetic approach to explore the association between height and CAD risk helps remove some of the lifestyle and environmental confounders present in epidemiological studies
• Background: 180 single-nucleotide polymorphisms (SNPs)
were found to be significantly associated with height (GIANT study in Europeans, n=183,727)
• Aims: Assess combined effect of 180 height-
associated SNPs on CAD risk Assess effect of these SNPs on CAD risk factors
(e.g. blood pressure, LDL, etc.) Identify any biological pathways mediating this
association
• Background: 180 single-nucleotide polymorphisms (SNPs)
were found to be significantly associated with height (GIANT study in Europeans, n=183,727)
• Aims: Assess combined effect of 180 height-
associated SNPs on CAD risk Assess effect of these SNPs on CAD risk factors
(e.g. blood pressure, LDL, etc.) Identify any biological pathways mediating this
association
Nelson NEJM 2015
Study PopulationStudy Population
•Summary association statistics extracted from 3 meta-analyses of GWAS case-control studies of CAD:
•Coronary Artery Disease Genomewide Replication and Meta-Analysis (CARDIoGRAM) Consortium
21977 cases, 62289 controls All 180 SNP variants
•Coronary Artery Disease (C4D) Consortium 17766 cases, 17115 controls All 180 SNP variants
•Metabochip Combined CARDIoGRAM+C4D Consortium for cohorts
not included in previous meta-analyses 25323 cases, 48979 controls 112 SNP variants
•Summary association statistics extracted from 3 meta-analyses of GWAS case-control studies of CAD:
•Coronary Artery Disease Genomewide Replication and Meta-Analysis (CARDIoGRAM) Consortium
21977 cases, 62289 controls All 180 SNP variants
•Coronary Artery Disease (C4D) Consortium 17766 cases, 17115 controls All 180 SNP variants
•Metabochip Combined CARDIoGRAM+C4D Consortium for cohorts
not included in previous meta-analyses 25323 cases, 48979 controls 112 SNP variants
Nelson NEJM 2015
Advantages of genetic approach in this study over traditional epidemiologic approach:- Genetic determinants of height are not confounded by
lifestyle (e.g. nutrition) or environmental (e.g. socioeconomic status) factors
- Allows tracing of genetic pathways to identify potential mechanisms driving association
Limitations:- Lifestyle and environmental choices/events can be a direct
consequence of height
Height-Associated Variants and CAD - Methods
Height-Associated Variants and CAD - Methods
• Using: β1 = effect size of association between variant
and height (GIANT study) β2 = effect size of association between variant
and CAD (CARDIoGRAM, C4D, and Metabochip studies)
• To calculate: β3 = effect size of association between height
and CAD mediated through variant β3 is the odds ratio for CAD per 1-standard
deviation increase in genetically determined height
• Using: β1 = effect size of association between variant
and height (GIANT study) β2 = effect size of association between variant
and CAD (CARDIoGRAM, C4D, and Metabochip studies)
• To calculate: β3 = effect size of association between height
and CAD mediated through variant β3 is the odds ratio for CAD per 1-standard
deviation increase in genetically determined height
OR for CAD per OR for CAD per 1 SD increase in 1 SD increase in genetically genetically determined determined heightheight
OR for CAD per OR for CAD per 1 SD increase in 1 SD increase in genetically genetically determined determined heightheight
Height-Associated Variants and CAD - Methods
Height-Associated Variants and CAD - Methods
• Association between individual SNPs with height (β1) and between individual SNPs with CAD (β2) is very small
• Thus, β3 values for individual SNPs are centered around 1.0 and generally insignificant
• To determine complete association between height and CAD, we combined β3 values from all SNPs using inverse-variance—weighted random-effects meta-analysis
• Association between individual SNPs with height (β1) and between individual SNPs with CAD (β2) is very small
• Thus, β3 values for individual SNPs are centered around 1.0 and generally insignificant
• To determine complete association between height and CAD, we combined β3 values from all SNPs using inverse-variance—weighted random-effects meta-analysis
Height-Associated Variants and CAD - Results Height-Associated Variants and CAD - Results
• Combined association between height-associated SNPs and CAD was significant (OR=0.88, 95% CI = 0.82 to 0.95, p<0.001)
• 13.5% increase in CAD risk per 1-standard deviation (SD) decrease in height
• Most individual β3 values centered around 1.0 and insignificant, but a few values were significant (p<0.05) 3 out of 180 SNPs remained significant after
Bonferroni correction
• Combined association between height-associated SNPs and CAD was significant (OR=0.88, 95% CI = 0.82 to 0.95, p<0.001)
• 13.5% increase in CAD risk per 1-standard deviation (SD) decrease in height
• Most individual β3 values centered around 1.0 and insignificant, but a few values were significant (p<0.05) 3 out of 180 SNPs remained significant after
Bonferroni correction
Genetic Risk Score Analysis - MethodsGenetic Risk Score Analysis - Methods
• Subgroup of CAD cohorts had genomewide individual-level genotype data available (8240 cases, 10009 controls)
• Weighted analysis of genetic risk scores to evaluate effect of increasing number of height-associated variants on CAD risk
• Genetic risk score: Value from 0 to 2 for each SNP obtained by
multiplying sum of posterior probabilities for height-increasing allele with effect size of allele on height
Values totalled across all SNPs for each individual Individuals ranked and divided into quartiles Logistic regression on quartiles to estimate
combined odds ratio for CAD
• Subgroup of CAD cohorts had genomewide individual-level genotype data available (8240 cases, 10009 controls)
• Weighted analysis of genetic risk scores to evaluate effect of increasing number of height-associated variants on CAD risk
• Genetic risk score: Value from 0 to 2 for each SNP obtained by
multiplying sum of posterior probabilities for height-increasing allele with effect size of allele on height
Values totalled across all SNPs for each individual Individuals ranked and divided into quartiles Logistic regression on quartiles to estimate
combined odds ratio for CAD
Genetic Risk Score Analysis - ResultsGenetic Risk Score Analysis - Results
• Increased number of height-raising alleles associated with reduced risk of CAD
• Odds ratios for each quartile: Quartile 2 vs. Quartile 1 = 0.90 (95% CI = 0.83 to
0.98, p=0.02) Quartile 3 vs Quartile 1 = 0.88 (95% CI = 0.81 to
0.96, p=0.003) Quartile 4 vs Quartile 1 = 0.74 (95% CI = 0.68 to
0.80, p<0.001)
• Quartile 4 includes individuals with highest number of height-raising alleles, Quartile 3 has individuals with second most, etc.
• Increased number of height-raising alleles associated with reduced risk of CAD
• Odds ratios for each quartile: Quartile 2 vs. Quartile 1 = 0.90 (95% CI = 0.83 to
0.98, p=0.02) Quartile 3 vs Quartile 1 = 0.88 (95% CI = 0.81 to
0.96, p=0.003) Quartile 4 vs Quartile 1 = 0.74 (95% CI = 0.68 to
0.80, p<0.001)
• Quartile 4 includes individuals with highest number of height-raising alleles, Quartile 3 has individuals with second most, etc.
What if SNPs for Height are also associated with CAD risk factors? and CAD Risk Factors
What if SNPs for Height are also associated with CAD risk factors? and CAD Risk Factors
• Obtained estimates of effect sizes for 180 height variants on CAD risk factors based on meta-analyses for genomewide association studies: Systolic blood pressure (n=69899) Diastolic blood pressure (n=69909) Mean arterial pressure (n=29182) Pulse pressure (n=74079) LDL cholesterol level (n=95454) HDL cholesterol level (n=99900) Triglyceride level (n=96598) Type 2 diabetes (34840 cases, 114981 controls) Glucose (n=96496) Log-transformed plasma insulin (n=85573) Smoking quantity (n=41150)
• β3 values calculated for association of height with CAD risk factors (similar to how they were calculated for overall CAD risk)
• Obtained estimates of effect sizes for 180 height variants on CAD risk factors based on meta-analyses for genomewide association studies: Systolic blood pressure (n=69899) Diastolic blood pressure (n=69909) Mean arterial pressure (n=29182) Pulse pressure (n=74079) LDL cholesterol level (n=95454) HDL cholesterol level (n=99900) Triglyceride level (n=96598) Type 2 diabetes (34840 cases, 114981 controls) Glucose (n=96496) Log-transformed plasma insulin (n=85573) Smoking quantity (n=41150)
• β3 values calculated for association of height with CAD risk factors (similar to how they were calculated for overall CAD risk)
Height-Associated Variants and CAD Risk Factors
Height-Associated Variants and CAD Risk Factors
• β3 values represent change in measurement unit of variable per 1-standard deviation change in height
• Only LDL cholesterol level (OR= -0.06, 95% CI = -0.09 to -0.04, p<0.001) and triglyceride level (OR= -0.05, 95% CI = -0.08 to -0.03, p<0.001) had significant associations with height-associated SNPs
• 19% of association between genetically determined height and CAD explained by effect of height on LDL cholesterol
• 12% of association between genetically determined height and CAD explained by effect of height on triglyceride level
• β3 values represent change in measurement unit of variable per 1-standard deviation change in height
• Only LDL cholesterol level (OR= -0.06, 95% CI = -0.09 to -0.04, p<0.001) and triglyceride level (OR= -0.05, 95% CI = -0.08 to -0.03, p<0.001) had significant associations with height-associated SNPs
• 19% of association between genetically determined height and CAD explained by effect of height on LDL cholesterol
• 12% of association between genetically determined height and CAD explained by effect of height on triglyceride level
ConclusionsConclusions
• Association between genetically determined decrease in height (sum of 180 height-associated SNPs) and increased risk of CAD (13.5% increase in CAD risk per 1-SD decrease in height) 2.3 % of this association explained by effect of height on LDL
levels (inverse relationship) 1.9% of this association explained by effect of height on
triglyceride levels (inverse relationship)
• Genetically determined height was associated with CAD risk in men but not in women, in contrast with findings from epidemiological studies suggesting an association in both genders
• Height-associated SNPs were not significantly associated with BMI, suggesting pathway independent of obesity
• Association between genetically determined decrease in height (sum of 180 height-associated SNPs) and increased risk of CAD (13.5% increase in CAD risk per 1-SD decrease in height) 2.3 % of this association explained by effect of height on LDL
levels (inverse relationship) 1.9% of this association explained by effect of height on
triglyceride levels (inverse relationship)
• Genetically determined height was associated with CAD risk in men but not in women, in contrast with findings from epidemiological studies suggesting an association in both genders
• Height-associated SNPs were not significantly associated with BMI, suggesting pathway independent of obesity