Betwixt and Between; Common and Rare Genetic Variants in Human Disease Peter Szatmari MD Offord...
-
Upload
todd-tucker -
Category
Documents
-
view
214 -
download
0
Transcript of Betwixt and Between; Common and Rare Genetic Variants in Human Disease Peter Szatmari MD Offord...
Betwixt and Between; Common and Rare Genetic Variants in Human Disease
Peter Szatmari MD
Offord Centre for Child Studies
McMaster University
McMaster Children’s Hospital
Financial Disclosure
The Canadian Institutes of Health Research
Autism Speaks Sinneave Family Foundation Ontario Research Fund Royalties from Guildford Press No other sources of funding (stocks,
industry, Big Pharma etc)
Objectives
What have we learned about the genetic architecture of ASD;
Focus on explanatory power of common and rare variants
Copy Number Variants as examples of rare risk factors
Neither story provides much explanatory power So we are “betwixt and between”; what does the
future hold? WGS?
What is Genetic Epidemiology?
The study of inherited factors in disease Combination of epidemiology and statistical
genetics Uses a variety of study designs to meet its
objectives
Steps in Genetic Epidemiology
Is the disorder familial?- family studies Is the familiality due to genetic factors?-twin and
adoption studies Can candidate genes be identified? Can chromosomal susceptibility regions be
identified?-GW linkage and association studies Exome and Whole genome sequencing? A disease can be genetic without being inherited The history of autism genetics thru these steps
An heterogeneous ‘spectrum’ disorder involving deficits in 3 domains of function
4 to 1 sex ratio, more females with severe ID
Changing epidemiology; more non-autism ASD
Changing epidemiology; less frequent ID
Diagnostic substitution occurringmedical co-morbidities
25-40%language
Anxiet
y
30-5
0%
Socialcommunication
deficits
cogn
itive
deficit
sStrict autism
Spectrumrestrictive/repetitive
behaviors
0.6 % to 1% prevalence
Increasing prevalence due to better case finding
Autism spectrum disorders
6
Family Studies
RR to sibs; 5% but based on old data collected retrospectively
Stoppage rules; when taken into account, sib RR increases to 10%
Baby sibs studies; RR now 19% Intermediate phenotypes in another 20%
Twin Studies
Twin studies; traits in general population and in diagnosed twins
Older studies of ASD twins; MZ vs DZ=.65 vs .05 Heritability >90%
Hallmeyer et al 2011; MZ vs DZ concordance Males .58 vs.21 Females .60 vs. 27
Greater role for shared environmental factors (55%) than genetic (37%)
The Genetic Architecture of ASD
Some single gene disorders; TS, FraX, NF, etc (5%)
Chromosomal abnormalities spread throughout the genome (5%)
391 cytogenetically-visible breakpoints in autismSource: http://projects.tcag.ca/autism/
11 122 5 9 104 863 131 7
14 20 21 22 Y15 16 18 X17 19
Translocation (n=126)
Deletion (n=128)
Inversion (n=37)
Duplication (n=100)
Breakpoints
What About the Other 90%?
Little family history of autism, low risk to sibs and twins
Like other genetically “complex” disorders such as CVD, epilepsy, obesity, diabetes, etc
Except that effect on fertility is greater Two models of genetic complexity
Common disease-common variant Common disease-rare variant
The Common disease-common variant model; finding genes
Candidate gene studies
Genome wide linkage
GWAS
Common Disease-Common Variant model
Non-syndromic, non-Mendelian ASD is a common disease, therefore it might be caused by common genetic variants
Polygenic multifactorial model; each gene has a small to moderate effect size
Many different variants with an additive effect
Candidate Gene Studies
ASD considered to be “caused” by neurotransmitters; 5HT, dopamine, NE
Focus on genes associated with regulating those proteins
Hundreds of positive results Hundreds of non-replications Small sample sizes, multiple testing of
different alleles, marker density, population stratification etc
Linkage Studies
Common variants of moderate to large effect size
Genetic (locus) homogeneity
Focus on affected sib pairs and non-parametric models
Linkage; Parametric Methods
Based on non-independent segregation of genetic markers and disease alleles
Developed for Mendelian disorders “Log of the odds” of linkage vs no linkage (>3.0 is
significant) Need dense families Accurate classification is essential Must specify a genetic model (gene frequency,
mode of transmission, penetrance)
Non Parametric methods
Degree of allele sharing among affected relatives, most commonly sibs
Sibs share 0,1 and 2 alleles at 25%, 50% and 25%
Is there distortion in allele sharing? Model free, less vulnerable to
misclassification Major challenge is power; esp when there is
genetic (locus) heterogeneity!
Common Disorder/Common Variant Linkage Studies in ASD
Many genome wide linkage studies using affected sib pairs (using non-parametric methods)
Each with sample size 50 to 400 Many significant linkage peaks but few are
replicable Conclusion; disorder is so heterogeneous
and effect of common variant so small we need very large sample sizes
Autism Genome ProjectAutism Genome Project Phase IPhase I
Affymetrix 10k SNP genotype dataAffymetrix 10k SNP genotype data Linkage analysis in 1146 multiplex autism Linkage analysis in 1146 multiplex autism
familiesfamilies Initial scan for CNVInitial scan for CNV
Phase IIPhase II Illumina 1M SNP genotype dataIllumina 1M SNP genotype data High-resolution scan for High-resolution scan for de novode novo and inherited and inherited
CNVCNV Genome-wide association analysisGenome-wide association analysis Molecular studies of candidate lociMolecular studies of candidate loci
Problems with Linkage for Complex Disorders
Very sensitive to locus heterogeneity Low power for loci of small to moderate
effects Very sensitive to misclassification of
phenotype Turn to GWAS; much greater power than
linkage for alleles of small effect
Genome Wide Association Studies (GWAS)
1 Million genetic markers (SNP’s are biallelic markers)
Which markers in which genes are more common in children with ASD than expected? Trio based or case-control
Are those markers located in genes (or in LD with genes) that are expressed in brain?
GWAS
Very successful if MAF>5% 500 SNP’s (genetic markers) associated with
many common diseases Eg Type 2 diabetes; 5000 cases and 5000
controls 18 SNP’s associated with type 2 diabetes
(OR=1.09 to 1.37) Explain 6% of the heritability Actual causal variant not discovered
GWAS
Wang et al (2009); cadherin genes at 5p14
Ma et al (2009); also at 5p14 but only in secondary analysis
Weiss et al (2009) 5p15 at SEMA5A
Anney et al (2010) MACROD2
Bottom Line of GWAS?
One SNP barely reaches GWS No subtype or ASD quantitative trait reaches
GWS (especially if correct for multiple testing) None of the other results can be replicated But beware of the “Winner’s Curse”! GWAS very sensitive to allele frequency and
allelic heterogeneity
1.2
1.4
1.6- 2.0
Largest sample evaluated in Stage 1N = 1385 ASD subjects
OddsRatio
Power
Risk allele frequency
The Argument for the Common Variant Model
We should be studying more “familial cases” We should be using intermediate phenotypes,
quantitative traits We should be looking at gene X gene, gene X
environment interactions We should be looking at parent of origin effects We should ignore p-values and instead rank order
SNP’s All true, next generation of GWAS
The Argument Against the Common Variant Model
ASD is associated with reduced fertility New variants must arise de novo that are risk
factors to keep prevalence stable If they are new they are rare Each person carries on average 175 de novo
mutations, deletions, duplications that are mostly benign
If a deleterious variant occurs in a brain expressed gene? Might cause ASD
Is ASD a Common Disease/Rare Variant?
ASD a disorder with reduced fertility De novo mechanisms of causation (like a
spontaneous mutation) These will necessarily be rare until they
diffuse thru the population
What is a Rare Event?
Frequency of risk factor<1% Variation in DNA sequence that affects
protein coding SNP; biallelic marker (by itself or in LD with a
DNA sequence) Structural variant; chromosomal abnormality
(ie a CNV, insertions, duplications, translocations etc)
But they might have a big effect size
What are Copy Number Variants (CNV’s)?
Variations in DNA segments >1kb
Deletions, insertions, duplications, others
Rare or common; inherited from parents or arise de novo?
If CNV overlaps a gene expressed in brain, AND it disrupts the function of that gene, it could lead to ASD
“CNV refers to DNA segments for which copy number differences have been observed in the comparison of two or
more genomes”
Lee and Scherer, Expert Reviews in Mol. Med. 2010
DeletionDeletion
Duplication
Copy Number Variation (CNV)Copy Number Variation (CNV)
Slide courtesy of Dr. C. Marshall
Copy Number Variations (CNVs)
• We all have them!• Most of them do not
harm us• Most of them we
inherited from our parents
Rare Variants in ASD
What is the evidence that rare variants, as measured by CNV’s, play a role in ASD?
Simple comparison of “global burden” of brain expressed CNVs or previously implicated CNVs in ASD vs controls
Autism Genome Project
Collaboration of 13 research groups
Pooling of families (1500 families)
Common genotyping (1M SNP’s) and clinical measures (ADI/ADOS) for all affected sib pairs
Funded by Autism Speaks, CIHR, Genome Canada, UK MRC, HRB (Ireland)
Global burden for rare CNVs in cases vs. controls
PLINK v. 1.07, genome-wide P values, one-sided tests, 100,000 permutations*Pcorr, controlled for global case-control differences, logistic regression
3 measures:• CNV rate• Estimated size• CNV location and # of genes affected
*
48
CNV burden in known ASD and/or ID genes
Enrichment of genic-CNVs in known ASD and ID loci (1.69 fold, P= 3.4 x 10-4)
n=46 n=127 n=103
Genes in which CNV’s have been replicated
Neuroligin 3 and 4 Neurexin Shank2 and Shank3 Contactin associated protein 2 PTCHD1 Large region on chromosome 16p11 New ones reported each week! Each one seen in <1% of cases Range of effects; linked in common networks,
ASD and ID risk genes may be linked in a connected pathway
Functional Enrichment Gene-set Map for ASD
de novo CNVdup 8p23.3, 791kb, disrupts DLGAP2
maternal missense mutation Xp21.3, IL1RAPL1 (A117S, 349G>T)1/325 cases; 0/250 controls
T / G
-- / T
G / --
-- / G
5290
Familial segregation - examples
5444
2 adjacent 17q25.3 de novo CNVs
de novo del 17q25.3: SLC16A3, CSNK1Dde novo dup 17q25.3, 829Kb, 37 genes
*In red if there is previous evidence suggesting gene involvement in ASD or ID
121kb del
5298
121kb del121kb del
maternal Xp22.11 del in malesDDX53/ PTCHD1AS (non-coding RNA for PTCHD1)
829 kb dup64 Kb del 791 kb dup
54
MM0088 – MPX family. Proband has 676 kb de novo loss at 16p11.2
SK0019 – SPX family. Proband has 676 kb de novo loss at 16p11.2
SK0102 – SPX family. Proband has 432 kb de novo gain at 16p11.2
676 kb loss 432 kb gain
MM0088 SK0102
676 kb loss
SK0019
MPX #62346:De novo 1.2 Mb deletion at 3p25.1,3.4 Mb deletion at 5p15, t(5;7)(p15;p13)
III. What does a de novo change mean in a complex disorder?
PDD ADdel 3p25.1del 5p15
SPX #HSC0215:De novo 1 Mb deletion at 1p21.3Inherited t(19;21)(p13.q22.1)
ADdel 1p21.3
t(19;21)
t(19;21)
t(19;21)
Prefer multiple lines of evidence supporting locus involvement
MM0160/MM1470-72 [SHANK1 deletion]
MM0160_003blood64kb del
MM0160_001blood64kb del
c
c
c c
c
21 3
4 6 7 9 10
14 15
1
11 12
5 8 16
MM0160_002bloodno del
MM0160_005blood64kb del
MM0160_004no del (from old DNA)bloodno del
MM1470_004saliva64kb delAsperger
MM1470_005salivano del
MM1470_003blood64kb del
MM0160_006bloodno del
MM1472_002bloodno del
MM1470_002bloodno del
MM1471_002refused blood collection
? ?
13MM1472_003bloodno del
MM1471_003salivano del
MM0160_007lymphocyte64kb del
MM0160_008lymphocyteno loss
?
CNV’s in ASD
More de novo CNV’s in genes implicated in ASD and ID, OR=1.69; 7% of cases vs 4% of controls
Population attributal risk is 3% Discovered functional networks of genes In ASD, a shift from neurotransmitters to synaptic
genes Same CNV’s seen in ID, epilepsy, ADHD,
schizophrenia, BAD (?)
Next Generation of Studies
Search for rare inherited variants thru linkage CNV’s smaller than 1KB More complicated structural rearrangements Whole exome and whole genome sequencing Current efforts at WGS in ASD identifying
variants in another 10-15%? Rare mutations common in unaffected controls
as well
Challenges
Annotation of functional significance of variants
Determination of “causation” when risk factor is rare and disorder is multifactorial
Are the health benefits of identifying rare genetic variants worth the cost? Diagnostics and therapeutics?
Heterogeneity is the main obstacle
Recent findings from WGS
Rare variants are common; due to populaiton overgrowth and weak purifying selection
Most SNV in the genome are rare >90% of SNVs detected to be functionally
relevant were rare But it will take huge sample sizes to detect
the majority of rare variants involved in disease mechanisms.