Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.
Tom Price MRC SGDP Centre, Institute of Psychiatry Linkage analysis and eQTL studies Systems...
-
Upload
olivia-wiley -
Category
Documents
-
view
217 -
download
4
Transcript of Tom Price MRC SGDP Centre, Institute of Psychiatry Linkage analysis and eQTL studies Systems...
Tom Price
MRC SGDP Centre, Institute of Psychiatry
Linkage analysis and eQTL studies
Systems Biomedicine Graduate Programme 2008/9
Genetic Linkage Studies• Use the inheritance of markers within families to identify chromosomal regions where disease genes may lie
Disease susceptibility gene
Genetic markers
M7
M1
M2
M3
M4
M5M6
Linkage Pedigree
Random chance? Or linkage between marker and disease locus?
1 12 2
2 1 3 3
1 3 1 3 1 3 2 3 2 3 2 3
Disease cases
Genotype
The Possibilities
MendelianOne gene = one traitCystic fibrosis
Non-MendelianMultiple genes and environmentEpilepsy, liability to stroke
Quantitative traitsMultiple genes and environment
HeightS
IMP
LEC
OM
PLE
X
CONTINUOUS DISCRETE
Multiple alleles of a single geneDifferent alleles different effects
Trinucleotide repeat diseases
• Laws of heredity discovered by Mendel 1865
– Three laws of heredity
One Gene, One Trait?
Mendel’s Laws
1. Dominance• When two contrasting characters are crossed only one
appears in the next generation
2. Segregation• For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment• Alleles for different traits are inherited independently of
each other
Dominance for Hair Colour
Mendel’s Laws
1. Dominance• When two contrasting characters are crossed only one
appears in the next generation
2. Segregation• For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment• Alleles for different traits are inherited independently of
each other
Segregation
AB CD
A
C
DD
C
D C
C
Parental Genotypes
Mendel’s Laws
1. Dominance• When two contrasting characters are crossed only one
appears in the next generation
2. Segregation• For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment• Alleles for different traits are inherited independently of
each other
Independent Assortment
• Eye colour IS NOT predictable from hair colour– Blonde hair and brown or blue eyes
– Brown hair and blue or brown eyes
Mendel’s Laws
1. Dominance• When two contrasting characters are crossed only one
appears in the next generation
2. Segregation• For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment• Alleles for different traits are inherited independently of
each other
Independent Assortment
• Eye colour IS often predictable from hair colour– Blonde hair and blue eyes
– Brown hair and dark eyes
What is Linkage?
• A method to map the relative positions of two or more loci using genetic markers
– Occurs because loci do not obey Mendel’s third law
Breaking the Third Law
A, B, O = blood group genes affected, unaffected
Adapted from Phillip McLean http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/linkage/
Breaking the Third Law
A, B, O = blood group alleles affected, unaffected
Adapted from Phillip McLean http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/linkage/
Breaking the Third Law
A, B, O = blood group alleles affected, unaffected
ABO locus predictsD locus
Adapted from Phillip McLean http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/linkage/
Genetics for Card Players
♠ We can think of genetic information as a deck of cards.
♥ The closer 2 cards are, the less likely it is that they will separate during shuffling.
♣ If not much shuffling has occurred, more distant cards can act as markers.
Linkage Groups
• If inheritance of two loci is independent– They are unlinked
• If inheritance of two loci is dependent– They are in the same linkage group– Linkage groups correspond to the physical
structures called chromosomes
Chromosomes
• Chromosomes are NOT inherited as a single block
• Recombination occurs at meiosis– Affects co-inheritance
of alleles
Recombination and Meiosis
• Nearby loci A and B are likely to co-segregate during meiosis.
• Distant loci B and C are less likely to co-segregate during meiosis.
Recombination
• For any pair of markers– Parental pattern = NR– Mixed pattern = R
AaBb
ccdd
AcBd
AcBd
Acbd
acBd
NR NR R R
AB
ab
Non-recombinant gametes
Recombination
• For any pair of markers– Parental pattern = NR– Mixed pattern = R
AaBb
ccdd
AcBd
AcBd
Acbd
acBd
NR NR R R
Ab
aB
Recombinant gametes
Recombination
• For any pair of markers– Parental pattern = NR– Mixed pattern = R
AaBb
ccdd
AcBd
AcBd
Acbd
acBd
NR NR R R
Recombination Fraction
= The proportion of offspring that are recombinant between two loci
• RF = 0.5 between unlinked loci (e.g. different chromosomes)
Parametric Linkage Analysis
• Uses pedigree information to estimate recombination fraction between markers and disease
• Assumes a particular model of inheritance (additive, dominant, recessive)
• Useful for Mendelian disorders (single gene)
Allele Sharing
• People with rare diseases are more highly related to each other near the disease-causing gene than you would typically expect.
• This is because nearby markers tend to be inherited together with the disease locus.
→We can look for excess allele sharing as a signal that a disease locus is nearby.
Identity By State
• When two individuals possess the same alleles at a locus, they are said to be identical by state (IBS).
• For example, these affected sibs share one allele IBS, the allele a.
adac
Identity By State
• But if the parental genotypes are unknown, we do not know whether the offspring have inherited the a allele from the same parent or from different parents.
• We can’t established shared inheritance, so IBS allele sharing is useless for linkage analysis.
adac
?? ??
Identity By Descent
• Individuals who share copies of a common ancestral allele are said to be identical by descent (IBD).
• For example, these affected sibs share one allele IBD. The paternal allele a has been transmitted to both offspring.
ab cd
adac
Allele Sharing in Affected Sib Pair
ab cd
??ac
Sibling genotypes
Alleles shared IBD
Expected Probability
ac ac 2 ¼
ac ad 1 ½
ac bc 1 ½
ac bd 0 ¼
Allele Sharing in Affected Sib Pair
ab cd
??ac
Sibling genotypes
Alleles shared IBD
Expected Probability
ac ac 2 ¼
ac ad 1 ½
ac bc 1 ½
ac bd 0 ¼
Probability under random transmission of marker alleles.
Allele Sharing in Affected Sib Pair
ab cd
??ac
Sibling genotypes
Alleles shared IBD
Expected Probability
ac ac 2 ¼
ac ad 1 ½
ac bc 1 ½
ac bd 0 ¼
Probability under random transmission of marker alleles. But what if the marker lies near a disease gene? Affected siblings are more likely to share marker alleles IBD.
Non-parametric Linkage Analysis
• Uses information on IBD allele sharing – Usually between affected sibs
• Do not need to specify the model of inheritance at any locus
• Useful for complex traits (multiple genes, different modes of inheritance)
Linkage Statistic for Affected Sib Pairs
Alleles IBD 0 1 2
Expect 0.25 0.50 0.25
Observed Z0 Z1 Z2
Under linkage
Linkage Statistic for Affected Sib Pairs
Alleles IBD 0 1 2
Expect 0.25 0.50 0.25
Observed Z0 Z1 Z2
Under linkage
Suppose x families share 0 alleles IBD,
y families share 1 allele IBD,
z families share 2 alleles IBD.
Under a multinomial model, the expected probability of the marker data Z0, Z1, Z2 assuming no linkage is
P( Z0, Z1, Z2 ) = x! y! z! 0.25 x 0.5 y 0.25 z
(x+y+z)!
Linkage Statistic for Affected Sib Pairs
Alleles IBD 0 1 2
Expect 0.25 0.50 0.25
Observed Z0 Z1 Z2
Under linkage
Suppose x families share 0 alleles IBD,
y families share 1 allele IBD,
z families share 2 alleles IBD.
LOD = log10 P(marker data given estimated sharing Z0, Z1, Z2 )
P(marker data given sharing 0.25, 0.5, 0.25)
= log10 Z0x Z1
y Z2z
0.25 x 0.5 y 0.25 z
Example: 200 ASPs
Sharing among 200 affected sibling pairs
0 1 2
Observed sharing 36 90 74
Expected sharing 50 100 50
• Z0 = 36/200 = 0.18
• Z1 = 90/200 = 0.45
• Z2 = 74/200 = 0.37
Recall: baseline values0.250.50.25
Example: 200 ASPs
Sharing among 200 affected sibling pairs
0 1 2
Observed sharing 36 90 74
Expected sharing 50 100 50
• Z0 = 36/200 = 0.18
• Z1 = 90/200 = 0.45
• Z2 = 74/200 = 0.37
• LOD = log10 0.1836 0.4590
0.3774
0.2536 0.590
0.2574
= 3.35 STRONG EVIDENCE FOR LINKAGE
Recall: baseline values0.250.50.25
Complications of Linkage Analysis
• With unknown parental genotypes, allele sharing must be estimated using population allele frequencies
• Families with less than four alleles may give unclear sharing
• Multipoint linkage analysis, using information from adjacent markers, will increase power to detect genes
• Computationally intensive: use computer programs to calculate LOD scores
• Other problems due to non-paternity, genotyping errors, sample mix-ups, poor phenotype definition
?? ??ab cd
Software• Several programs are available, including:
– Parametric:
LINKAGE
MLINK
– Non-parametric:
MERLIN
GENEHUNTER
Linkage Study Design
Candidate gene search: dense marker genotyping within a region of positional or functional interest
Genome search: - Aim to identify several susceptibility genes• Families are genotyped on polymorphic markers across all
chromosomes• 300-400 microsatellite markers across genome, separated
by 10cM(or, more recently, 10,000 SNP markers)– tighter marker spacing gives more information– few markers makes it difficult to reconstruct haplotypes,
particularly without parental genotypes
Significance Level
• Lander and Kruglyak (1994) suggested criteria for affected sibling pair studies in complex diseases
LOD score > 2.2 suggestive linkage
LOD score > 3.6 significant linkage
• These LOD scores are expected to occur by chance in 1 and 1/20 times in a genome search, respectively
• Many studies of complex disease do not reach these cut-offs
• Another approach is to report highest LOD scores even if they are below these thresholds and look for replication across studies
Does It Work?
• Very powerful for mapping single gene disorders, e.g. early-onset Alzheimer’s Disease, many forms of mental retardation…
Does It Work?
• Very powerful for mapping single gene disorders, e.g. early-onset Alzheimer’s Disease, many forms of mental retardation…
• …but many non-replications for complex traits
Linkage v Association
Linkage Association
Usual sample
Families Unrelated individuals (e.g. case control)
Good for finding
Rare variants with large effects
Common variants with small effects
Identifies Broad chromosomal region
Narrow region usually within a single gene
Break
• Next up: application of linkage analysis to gene expression phenotypes.
Central Dogma
DNA → mRNA → protein
Finding Disease Pathways1. Conduct linkage/association study to find candidate2. Determine candidate gene function experimentally
Problems:• Markers only give regional information, the identity
of the causal variations remains obscure• Many GWAS hits are nowhere near any genes• Reliance on animal and in vitro models to probe
function
Genetics of Gene Expression• Linkage study or GWAS using mRNA abundance as
the phenotype
Motivation:• mRNA abundance as ‘endophenotype’
– Lies on causal path between genetic variation and disease• Hits (‘eQTLs’) may have less complex inheritance
– Larger effect sizes, fewer causal variants?• We may already know which transcripts are
dysregulated in diseased tissues– eQTLs can provide a link to finding susceptibility genes
The First eQTL Study
Cis Regulation
• Genetic variation near the gene locus that influence its expression
• What we think of as “functional” polymorphisms fall into this category
Trans Regulation
• Genetic variation away from the gene locus that influence its expression
• e.g. polymorphisms in “hub” genes that act as master regulators
C
AB
ED
Microarray Experiment
Human eQTL Studies
1st Author Year Journal Population Sample Tissue Measure Genotyping
Morley 2004 Nature CEPH 14 pedigrees LCL 8K Affy
Linkage scan
Monks 2004 AJHG CEPH 15 pedigrees LCL 25K oligo
Linkage scan
Dixon 2007 Nat Gen MRC-A 206 families LCL 54K Affy
Linkage scan
Goring 2007 Nat Gen SAFHS 1240 individuals Lympho-cytes
47K Illumina
Illumina 100K
Stranger 2007 Science HapMap 270 individuals LCL 47K Illumina
2M SNPs + 7K CNVs
Emilsson 2008 Nature IFB/IFA 1002/673 individuals
Blood/ Adipose
25K oligo
Illumina 370K + Linkage scan
Selected list
• Largest human eQTL study to date • 1,240 subjects from extended pedigrees• Blood lymphocytes, not lymphocyte cell
lines• 47K Illumina WG-6 Series I microarray• Expression adjusted for age, sex
Heritability
85% of 19,648 transcripts detected were heritable (FDR 5%)
Cis Regulation
• Single LOD score calculated at gene locus to identify cis-regulated transcripts
• 1,345 (6.8%) cis-regulated transcripts detected (FDR 5%)
• eQTL effect size overall: median 1.8%, mean 5.0%
• eQTL effect size in significant loci:median 24.6%, mean 29.1%
Trans Regulation
• Much lower power• No evidence of master regulators:
only 58 transcripts had 2+ peaks with LOD > 3
Gene Discovery Using eQTLs
Promoter variants in VNN1 are associated with transcript abundance and HDL-C concentration
Consistency of Results
• Morley cis eQTLs confirmed by Göring, but not trans eQTLs.
• This is consistent with tissue specificity of trans regulation, but also with lower power to detect trans effects.
• Linkage & association study• Icelandic subjects• Blood and adipose tissue samples• Expression adjusted for age, sex, BMI
Tissue Specificity
• Linkage eQTLs (FDR 5%) for 20,877 expression traits, 10,364 of them heritable (FDR 5%)
Cis Trans
Blood 2,529 52
Adipose 1,489 25
Both 762 ?
Proximity of Cis Acting Variants
• Association eSNPs were within 100kb of the probe for 96% of expression traits with strong cis-acting effects
Potential Problem
• Microarray probes overlapping SNPs can give rise to spurious cis eQTLs
• Older studies did not have so much resequencing data available to identify probes containing SNPs
Further Directions
• Animal models (e.g. mouse F2 crosses)• Other tissues (e.g. mouse brain)• Evoked phenotypes
– genetics of expression response to e.g. ionizing radiation, drug/hormone treatment
• Causal modelling– Genotype data can establish whether expression
changes cause disease or are a consequence of itSchadt et al. (2005) Nature Genetics 37: 710-717.
Website
http://tomprice.net/
Go to “Presentations”
Download slides