Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing...
-
Upload
albert-rogers -
Category
Documents
-
view
216 -
download
0
Transcript of Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing...
![Page 1: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/1.jpg)
Disease Genomics
![Page 2: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/2.jpg)
What is genomics?
• Looking at the properties of the genome as a whole– “seeing the wood for the trees”; identifying
patterns by considering many data points at once.– Examining large-scale properties requires a model
of what is expected just by chance, the null hypothesis.
![Page 3: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/3.jpg)
What is disease genomics?• OED: A condition of the body, or of some part or
organ of the body, in which its functions are disturbed or deranged;
• So disease genomics is about taking a whole-genome view to genetic disorders so we can discover:– The identification of the underlying genetic determinants– insights into the pathoetiology of the disease– How to select the appropriate treatment– How to prevent disease
![Page 4: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/4.jpg)
![Page 5: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/5.jpg)
Preventive Medicine• Empower people to make the appropriate life-style
choices– 23andMe, Coriell Study
• Treat the cause of the disease rather than the symptoms– E.g. peptic ulcers
• “All medicine may become pediatrics”Paul Wise, Professor of Pediatrics, Stanford Medical School, 2008
• Effects of environment, accidents, aging, penetrance …
– Somatic change, understanding how the genome changes over a life-time
– cancer
• Health care costs can be greatly reduced if– Invest in preventive medicine– Target the cause of disease rather than symptoms
![Page 7: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/7.jpg)
23andMe Spittoon
![Page 9: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/9.jpg)
Human genetic variation• Substitutions ACTGACTGACTGACTGACTG ACTGACTGGCTGACTGACTG
– Single Nucleotide Polymorphisms (SNPs)• Base pair substitutions found in >1% of the population
• Insertions/deletions (INDELS) ACTGACTGACTGACTGACTG
ACTGACTGACTGACTGACTGACTG– Copy Number Variants (CNVs)
• Indels > 1Kb in size
![Page 10: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/10.jpg)
• Variation can have an effect on function– Non-synonymous substitutions can change the
amino acid encoded by a codon or give rise to premature stop codons
– Indels can cause frame-shifts– Mutations may affect splice sites or regulatory
sequence outside of genes or within introns
Human genetic variation
![Page 11: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/11.jpg)
How much genetic variation does an individual possess?
1000 Genomes project: A map of human genome variation from population-scale sequencing, Nature 467:1061–1073
• Compared to the Human genome reference sequence, which is itself constructed from 13 individuals
![Page 12: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/12.jpg)
Penetrance of genetic variants
• Highly penetrant Mendelian single gene diseases– Huntington’s Disease caused by excess CAG repeats in huntingtin’s
protein gene– Autosomal dominant, 100% penetrant, invariably lethal
• Reduced penetrance, some genes lead to a predisposition to a disease
– BRCA1 & BRCA2 genes can lead to a familial breast or ovarian cancer– Disease alleles lead to 80% overall lifetime chance of a cancer, but 20%
of patients with the rare defective genes show no cancers
• Complex diseases requiring alleles in multiple genes– Many cancers (solid tumors) require somatic mutations that induce cell
proliferation, mutations that inhibit apoptosis, mutations that induce angiogenesis, and mutations that cause metastasis
– Cancers are also influenced by environment (smoking, carcinogens, exposure to UV)
– Atherosclerosis (obesity, genetic and nutritional cholesterol)
• Some complex diseases have multiple causes– Genetic vs. spontaneous vs. environment vs. behavior
• Some complex diseases can be caused by multiple pathways– Type 2 Diabetes can be caused by reduced beta-cells in pancreas,
reduced production of insulin, reduced sensitivity to insulin (insulin resistance) as well as environmental conditions (obesity, sedentary lifestyle, smoking etc.).
![Page 13: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/13.jpg)
Adapted from Nature 461, 747-753 (2009)
The search for disease-causing variants
![Page 14: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/14.jpg)
Dominant vs additive inheritance
0%
50%
100%
0 1 2
Number trait alleles inherited
Tra
it v
alu
e
Dominant
Additive
Inheritance models
![Page 15: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/15.jpg)
Dominant vs additive inheritance
0%
50%
100%
0 1 2
Number trait alleles inherited
Tra
it v
alu
e
Dominant
Additive
Inheritance models
Healthy
Disease
![Page 16: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/16.jpg)
Identifying the genetic causes of highly penetrant disorders
• de novo mutations
• Mendelian disorders
![Page 17: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/17.jpg)
de novo mutations• Humans have an exceptionally high per-
generation mutation rate of between 7.6 × 10−9 and 2.2 × 10−8 per bp per generation
• An average newborn is calculated to have acquired 50 to 100 new mutations in their genome– -> 0.86 novel non-synonymous mutations
• The high-frequency of de novo mutations may explain the high frequency of disorders that cause reduced fecundity.
![Page 18: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/18.jpg)
Prevalence (%) Age onset Mortality Fertility Heritability Paternal age
effect
Autism 0.30 1 2.0 0.05 0.90 1.4
Anorexia nervosa 0.60 15 6.2 0.33 0.56 —
Schizophrenia 0.70 22 2.6 0.40 0.81 1.4
Bipolar affective disorder 1.25 25 2.0 0.65 0.85 1.2
Unipolar depression 10.22 32 1.8 0.90 0.37 1
Anxiety disorders 28.80 11 1.2 0.90 0.32 —
Look at the epidemiology of the disease for clues
The role of genetic variation in the causation of mental illness: an evolution-informed framework
Uher, R. Molecular Psychiatry (2009) Dec;14(12):1072-82, “
![Page 19: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/19.jpg)
How do we identify the de novo mutation responsible?
1000 Genomes project: A map of human genome variation from population-scale sequencing, Nature 467:1061–1073
• Compared to the Human genome reference sequence, which is itself constructed from 13 individuals
![Page 20: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/20.jpg)
Identifying a causative de novo mutation
Patient with idiopathic disorder
Veltman and colleagues - Nat Genet. 2010 Dec;42(12):1109-12
(1) Sequence genome
(2) Select only coding mutations
(3) Exclude known variants seen in healthy
people
(4) Sequence parents and exclude their
private variants
For 6/9 patients, they were able to identify a single likely-causative
mutation
(5) Look at affected gene function and
mutational impact
~22,000 variants (exome re-sequencing)
MSGTCASTTRMSGTNASTTR
~5,640 coding variants
~143 novel coding variants
~5 de novo novel coding
variants
![Page 21: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/21.jpg)
Mendelian disease• Definition: Diseases in which the phenotypes are largely
determined by the action, lack of action, of mutations at individual loci.
• Rare 1% of all live born individuals• 4 types of inheritance : Autosomal dominant : Autosomal recessive : X linked dominant : X linked recessive
![Page 22: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/22.jpg)
Mendelian disease
![Page 23: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/23.jpg)
Definitions
SNP: “Single Nucleotide Polymorphism” a mutation found in >1% of the population,that produces a single base pair change in the DNA sequence
haplotypes
genotypes
alleles AA
AC G
CAA T
T
Genetic Association: Correlation between (alleles/genotype/haplotype) and a phenotype of interest.
both alleles at a locus form a genotype
Locus: Location on the genome
alternate forms of a SNP
AA
AC G
CAA T
T
A
A
A
C G
C
AA T
Tthe pattern of alleles on a chromosome
![Page 24: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/24.jpg)
Single Nucleotide Polymorphisms (SNPs)
![Page 25: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/25.jpg)
Recombination
A X
a x
Gametophytes(gamete-producing cells)
Gametes
a X
A x
Recombination
B
B
b
b
X/x: unobserved causative mutation
A/a: distant marker
B/b: linked marker
![Page 26: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/26.jpg)
Linkage Disequilibrium & Allelic Association
Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles.This is linkage disequilibrium
It is important for allelic association because it means we don’t need to assess the exact aetiological variant, but we see trait-SNP association with a neighbouring variant
Marker 1 2 3 n
LD
D
![Page 27: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/27.jpg)
SNPs can be used to track the segregation of regions of DNA
ACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCGACGTGCTCGATT GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG
ACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCGACGTGCTCGATT GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCGACGTGCTCGATTGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCGACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG
Time
Individual 1Individual 2
Individual 3Individual 4Individual 5Individual 6
ACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCGIndividual 7
ACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCGACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCGACGTGCTCGATTGATCCGC TAACTCGAATCCTCAGGATCTAGCCATATCGACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG
IndividualIndividualIndividualIndividual
ACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCGIndividualACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCGACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCGACGTGCTCGATTGATCCGC TAACTCGAATCCTCAGGATCTAGCCATATCGACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCG
IndividualIndividualIndividualIndividual
ACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCGIndividual
Locus 1 Locus 2
More time (+ recombination)
+ recombination
![Page 28: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/28.jpg)
SNPs can be used to associate regions of DNA with a trait (disease)
Case Control
C allele 0 5
T allele 3 2
Locus 1
Case Control
A allele 2 3
G allele 1 4
Locus 2
![Page 29: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/29.jpg)
Genetic Case Control Study
C/GT/G
T/AC/A
T/A
Allele T is ‘associated’ with disease
T/GC/A
T/G
C/G
C/A
Controls Cases
![Page 30: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/30.jpg)
Measures of Association: The Odds Ratio
• Odds are related to probability: odds = p/(1-p)– If probability of horse winning race is 50%, odds are
1/1– If probability of horse winning race is 25%, odds are
1/3 for win or 3 to 1 against win• If probability of exposed person getting disease
is 25%, odds = p/(1-p) = 25/75 = 1/3• We can calculate an odds ratio = cross-product
ratio (“ad/bc”)
![Page 31: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/31.jpg)
Odds ratio example: Association of a SNP with the occurrence of Myocardial Infarction
Presence of Disease
Variant Allele Present Absent
Present 813 3,061
Absent 794 3,667
Total 1,507 6,728
OR =Odds in Exposed
=813 / 3,061
=813 x 3,667
= 1.23Odds in Unexposed 794 / 3,667 794 x 3,061
![Page 32: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/32.jpg)
Family-based Linkage Analysis
a/A
a/A
a/Aa/A
A/A
A/A
A/A
a/a
HealthyDisease
Where is ??? = non-viable so not observed
![Page 33: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/33.jpg)
Aa AA
AA
• Related individuals are from the same family
• We assume we’re tracking the same causative mutation within the family
• Testing for Transmission Disequilibrium
Family Based Tests of Association
![Page 34: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/34.jpg)
Example
![Page 35: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/35.jpg)
Log of the Odds (LOD) score used to define disease locus
![Page 36: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/36.jpg)
Problems
Aa AA
AA
• Difficult to gather large enough families to get power for testing
• Recombination events near disease locus may be rare
• Resolution often 1-10Mb
• Difficult to get parents for late onset / psychiatric conditions
![Page 37: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/37.jpg)
Genome-wide Association Studies (GWAS)
• Looking for the segregation of disease (case/control) with particular genotypes across a whole population
• A lot of recombination within the population so you can very finely map loci
• Based on the common-disease, common-variant hypothesis– Only makes sense for moderate effect sizes (odds ratio < 1.5)
![Page 38: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/38.jpg)
• Technology makes it feasible-- Affymetrix: 500K; 1M chip arrived 2007. (Randomly distributed SNPs)-- Illumina: 550K chip costs (gene-based)
GWAS
Good for moderate effect sizes ( odds ratio < 1.5). Particularly useful in finding genetic variations that contribute to common,
complex diseases.
![Page 39: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/39.jpg)
Whole Genome Association
***
* *Scan Entire Genome - 500,000s SNPs
Identify local regionsof interest, examinegenes, SNP densityregulatory regions, etc
Replicate the finding
![Page 40: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/40.jpg)
Common disease common variant (CDCV) hypothesis
![Page 41: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/41.jpg)
QQ-plots
Log QQ plot
![Page 42: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/42.jpg)
Tests of association
• Treat genotype as factor with 3 levels, perform 2x3 goodness-of-fit test (Cochran-Armitage). Loses power if additive assumption not true.
• Count alleles rather than individuals, perform 2x2 goodness-of-fit test. Out of favour because
• sensitive to deviation from HWE• risk estimates not interpretable
• Logistic regression• Easily incorporates inheritance model (additive, dominant, etc)• Can be used to model multiple loci
Major allele homozygote (0)
Heterozygote (1)
Minor allele homozygote (2)
Case
Control
![Page 43: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/43.jpg)
http://www.broad.mit.edu/diabetes/scandinavs/type2.html
Genome-Wide Scan for Type 2 Diabetes in a Scandinavian Cohort
![Page 44: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/44.jpg)
HapMap• Rationale: there are ~10 million common SNPs in
human genome– We can’t afford to genotype them all in each association
study– But maybe we can genotype them once to catalogue the
redundancies and use a smaller set of ‘tag’ SNPs in each association study
• Samples– Four populations, 270 indivs total
• Genotyping– 5 kb initial density across genome (600K SNPs)– Second phase to ~ 1 kb across genome (4 million)– All data in public domain
![Page 45: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/45.jpg)
Haplotypes
Nature Genetics 37, 915 - 916 (2005)
![Page 46: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/46.jpg)
Published Genome-Wide Associations through 12/2009, 658 published GWA at p<5x10-8
NHGRI GWA Catalogwww.genome.gov/GWAStudies
![Page 47: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/47.jpg)
• Imagine a sample of individuals drawn from a population consisting of two distinct subgroups which differ in allele frequency.
• If the prevalence of disease is greater in one sub-population, then this group will be over-represented amongst the cases.
• Any marker which is also of higher frequency in that subgroup will appear to be associated with the disease
Population Stratification can be a problem
![Page 48: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/48.jpg)
Traditional Issues PersistAllelic heterogeneity
– When multiple disease variants exist at the same gene, a single marker may not capture them well enough.
– Haplotype-based association analysis is good theoretically, but it hasn’t shown its advantage in practice.
Locus heterogeneity– Multiple genes may influence the disease risk independently. As a result, for
any single gene, a fraction of the cases may be no different from the controls.
Effect modification (a.k.a. interaction) between two genes may exist with weak/no marginal effects.– It is unknown how often this happens in reality. But when this happens,
analyses that only look at marginal effects won’t be useful.– It often requires larger sample size to have reasonable power to detect
interaction effects than the sample size needed to detect marginal effects.
![Page 49: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/49.jpg)
Localization• Linkage analysis yields broad chromosome
regions harbouring many genes– Resolution comes from recombination events
(meioses) in families assessed– ‘Good’ in terms of needing few markers, ‘poor’ in
terms of finding specific variants involved
• Association analysis yields fine-scale resolution of genetic variants– Resolution comes from ancestral recombination events– ‘Good’ in terms of finding specific variants, ‘poor’ in
terms of needing many markers
![Page 50: Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d8b5503460f94a72004/html5/thumbnails/50.jpg)
Linkage vs AssociationLinkage
1. Family-based
2. Matching/ethnicity generally unimportant
3. Few markers for genome coverage (300-400 microsatellites)
4. Can be weak design
5. Good for initial detection; poor for fine-mapping
6. Powerful for rare variants
Association
1. Families or unrelateds
2. Matching/ethnicity crucial
3. Many markers req for genome coverage (105 – 106 SNPs)
4. Powerful design
5. Ok for initial detection; good for fine-mapping
6. Powerful for common variants; rare variants generally impossible