Intro to Quantitative Genetics HGEN502, 2011 Hermine H. Maes.

Post on 16-Dec-2015

217 views 2 download

Tags:

Transcript of Intro to Quantitative Genetics HGEN502, 2011 Hermine H. Maes.

Intro to Quantitative Genetics

HGEN502, 2011

Hermine H. Maes

Intro to Quantitative Genetics

1/18: Course introduction; Introduction to Quantitative Genetics & Genetic Model Building

1/20: Study Design and Genetic Model Fitting 1/25: Basic Twin Methodology 1/27: Advanced Twin Methodology and Scope of

Genetic Epidemiology 2/1: Quantitative Genetics Problem Session

Aims of this talk

Historical Background Genetical Principles

Genetic Parameters: additive, dominanceBiometrical Model

Statistical PrinciplesBasic concepts: mean, variance, covariancePath AnalysisLikelihood

Quantitative Genetics Principles

Analysis of patterns and mechanisms underlying variation in continuous traits to resolve and identify their genetic and environmental causes Continuous traits have continuous phenotypic range;

often polygenic & influenced by environmental effects Ordinal traits are expressed in whole numbers; can be

treated as approx discontinuous or as threshold traits Some qualitative traits; can be treated as having

underlying quantitative basis, expressed as a threshold trait (or multiple thresholds)

Types of Genetic Influence

Mendelian Disorders Single gene, highly penetrant, severe, small % affected

(e.g., Huntington’s Disease) Chromosomal Disorders

Insertions, deletions of chromosomal sections, severe, small % affected (e.g., Down’s Syndrome)

Complex Traits Multiple genes (of small effect), environment, large %

population, susceptibility – not destiny (e.g., depression, alcohol dependence, etc)

Genetic Disorders

Great 19th Century Biologists

Gregor Mendel (1822-1884): Mathematical rules of particulate inheritance (“Mendel’s Laws”)

Charles Darwin (1809-1882): Evolution depends on differential reproduction of inherited variants

Francis Galton (1822-1911): Systematic measurement of family resemblance

Karl Pearson (1857-1936): “Pearson Correlation”; graduate student of Galton

Family Measurements

Standardize Measurement

Pearson and Lee’s diagram for measurement of “span” (finger-tip to finger-tip distance)

From Pearson and Lee (1903) p.378

Parent Offspring Correlations

From Pearson and Lee (1903) p.387

Sibling Correlations

© Lindon Eaves, 2009

Nuclear Family Correlations

Quantitative Genetic Strategies

Family Studies Does the trait aggregate in families? The (Really!) Big Problem: Families are a mixture of

genetic and environmental factors

Twin Studies Galton’s solution: Twins One (Ideal) solution: Twins separated at birth But unfortunately MZA’s are rare Easier solution: MZ & DZ twins reared together

Twin Studies Reared Apart

Minnesota Study of Twins Reared Apart (T. Bouchard et al, 1979 >100 sets of reared-apart twins from across the US & UK All pairs spent formative years apart (but vary tremendously in amount

of contact prior to study) 56 MZAs participated

Types of Twins

Monozygotic (MZ; “identical”): result from fertilization of a single egg by a single sperm; share 100% of genetic material

Dizygotic (DZ, “fraternal” or “non-identical”): result from independent fertilization of two eggs by two sperm; share on average 50% of their genes

Logic of Classical Twin Study

MZs share 100% genes, DZs (on avg) 50% Both twin types share 100% environment

If rMZ > rDZ, then genetic factors are important If rDZ > ½ rMZ, then growing up in the same

home is important If rMZ < 1, then non-shared environmental

factors are important

Causes of Twinning

For MZs, appears to be random For DZs,

Increases with mother’s age (follicle stimulating hormone, FSH, levels increase with age)

Hereditary factors (FSH) Fertility treatment Rates of twins/multiple births are increasing, currently

~3% of all births

Zygosity of Twins

Chorionicity of Twins

100% of DZ twins are dichorionic~1/3 of MZ twins are dichorionic and ~2/3 are monochorionic

-16 -11 -6 -1 4 9 14

HTDEV1

-20

-10

0

10

20

HTDE

V2

Scatterplot for age and sex corrected stature in DZ twins

r=0.535

Virginia Twin Study of Adolescent Behavioral Development

-10 -5 0 5 10

HTDEV1

-12

-7

-2

3

8

13

HTDE

V2

Scatterplot for corrected MZ stature

r=0.924

Twin Correlations

MZ Stature DZ Stature

© Lindon Eaves, 2009

Ronald Fisher (1890-1962)

1918: On the Correlation Between Relatives on the Supposition of Mendelian Inheritance

1921: Introduced concept of “likelihood” 1930: The Genetical Theory of Natural

Selection 1935: The Design of Experiments Fisher developed mathematical theory

that reconciled Mendel’s work with Galton and Pearson’s correlations

Fisher (1918): Basic Ideas

Continuous variation caused by lots of genes (polygenic inheritance)

Each gene followed Mendel’s laws Environment smoothed out genetic differences Genes may show different degrees of dominance Genes may have many forms (multiple alleles) Mating may not be random (assortative mating) Showed that correlations obtained by Pearson & Lee were

explained well by polygenic inheritance [“Mendelian” Crosses with Quantitative Traits]

Biometrical Genetics

Lots of credit to:Manuel Ferreira, Shaun PurcellPak Sham, Lindon Eaves

Revisit common genetic parameters - such as allele frequencies, genetic effects, dominance, variance components, etc

Use these parameters to construct a biometrical genetic model

Model that expresses the:

(1) Mean

(2) Variance

(3) Covariance between individuals

for a quantitative phenotype as a function of genetic parameters.

Building a Genetic Model

Population level

Transmission level

Phenotype level

G

G

G

G

G

G

G

G

G

GG

G

G

G

G

G

GG

G

G

G

G

GG

PP

Allele and genotype frequencies

Mendelian segregationGenetic relatedness

Biometrical modelAdditive and dominance components

Genetic Concepts

Population level

1. Allele frequencies

A single locus, with two alleles - Biallelic / diallelic - Single nucleotide polymorphism, SNP

Alleles A and a - Frequency of A is p - Frequency of a is q = 1 – p

A a

A a

Every individual inherits two alleles - A genotype is the combination of the two alleles - e.g. AA, aa (the homozygotes) or Aa (the heterozygote)

2. Genotype frequencies (Random mating)

A (p) a (q)

A (p)

a (q)

Allele 1A

llele

2 AA (p2)

aA (qp)

Aa (pq)

aa (q2)

Hardy-Weinberg Equilibrium frequencies

P (AA) = p2

P (Aa) = 2pq

P (aa) = q2

p2 + 2pq + q2 = 1

Population level

Transmission level

Pure Lines AA aa

F1 Aa Aa

AA Aa Aa aa

3:1 Segregation Ratio

Intercross

Mendel’s experiments

Aa aa

Aa aa

F1 Pure line

Back cross

1:1 Segregation ratio

Transmission level

Transmission level

Pure Lines AA aa

F1 Aa Aa

AA Aa Aa aa

3:1 Segregation Ratio

Intercross

Aa aa

Aa aa

F1 Pure line

Back cross

1:1 Segregation ratio

Transmission level

Segregation, Meiosis

Mendel’s law of segregation

A3 (½) A4 (½)

A1 (½)

A2 (½)

Mother (A3A4)

A1A3 (¼)

A2A3 (¼)

A1A4 (¼)

A2A4 (¼)

Gametes

Father (A1A2)

Transmission level

1. Classical Mendelian traits

Dominant trait (D - presence, R - absence) - AA, Aa D - aa R

Recessive trait (D - absence, R - presence) - AA, Aa D - aa R

Codominant trait (X, Y, Z) - AA X - Aa Y - aa Z

Phenotype level

2. Dominant Mendelian inheritance

D (½) d (½)

D (½)

d (½)

Mother (Dd)

DD (¼)

dD (¼)

Dd (¼)

dd (¼)

Father (Dd)

Phenotype level

3. Dominant Mendelian inheritance with incomplete penetrance and phenocopies

D (½) d (½)

D (½)

d (½)

Mother (Dd)

DD (¼)

dD (¼)

Dd (¼)

dd (¼)

Father (Dd)

Phenocopies

Incomplete penetrance

Phenotype level

4. Recessive Mendelian inheritance

D (½) d (½)

D (½)

d (½)

Mother (Dd)

DD (¼)

dD (¼)

Dd (¼)

dd (¼)

Father (Dd)

Phenotype level

Two kinds of differences

Continuous Graded, no distinct boundaries e.g. height, weight, blood-pressure, IQ,

extraversion

Categorical Yes/No Normal/Affected (Dichotomous) None/Mild/Severe (Multicategory) Often called “threshold traits” because

people “affected” if they fall above some level of a measured or hypothesized continuous trait

Phenotype level

Polygenic Traits

1 Gene 3 Genotypes 3 Phenotypes

2 Genes 9 Genotypes 5 Phenotypes

3 Genes 27 Genotypes 7 Phenotypes

4 Genes 81 Genotypes 9 Phenotypes

Mendel’s Experiments in Plant Hybridization, showed how discrete particles (particulate theory of inheritance) behaved mathematically: all or nothing states (round/wrinkled, green/yellow), “Mendelian” diseaseHow do these particles produce a continuous trait like stature or liability to a complex disorder?

Phenotype level

Quantitative traits

Fra

ctio

n

Histograms by gqt

g==-1

0

.128205

g==0

-3.90647 2.7156g==1

-3.90647 2.7156

0

.128205

Fra

ctio

n

Histograms by gqt

g==-1

0

.128205

g==0

-3.90647 2.7156g==1

-3.90647 2.7156

0

.128205

Fra

ctio

n

Histograms by gqt

g==-1

0

.128205

g==0

-3.90647 2.7156g==1

-3.90647 2.7156

0

.128205

AA

Aa

aa

Fra

ctio

n

qt-3.90647 2.7156

0

.072

Phenotype level

m

d +a

P(X)

X

AA

Aa

aa

-a

AAAaaa

Genotypic means

Biometric Model

Genotypic effect

Phenotype level

m -a m +d m +a

Very Basic Statistical Concepts

1. Mean (X)

2. Variance (X)

3. Covariance (X,Y)

4. Correlation (X,Y)

Mean, variance,

covariance

i

iii

i

xfxn

xXE )(

1. Mean (X)

Mean, variance,

covariance2. Variance (X)

iii

ii

xfxn

xXEXVar 2

2

2

1)()(

Mean, variance,

covariance3. Covariance (X,Y)

iiiYiXi

iYiXi

YX

yxfyxn

yxYXEYXCov

,1

),(

Mean, variance, covariance (&

correlation)4. Correlation (X,Y)

rx,y =covx,ysxsy

Biometrical model for single biallelic QTL

Biallelic locus - Genotypes: AA, Aa, aa - Genotype frequencies: p2, 2pq, q2

Alleles at this locus are transmitted from P-O according to Mendel’s law of segregation

Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL)

Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean, (2) Variance and (3) Covariance between individuals for this quantitative trait X

Biometrical model for single biallelic QTL

Biallelic locus - Genotypes: AA, Aa, aa - Genotype frequencies: p2, 2pq, q2

Alleles at this locus are transmitted from P-O according to Mendel’s law of segregation

Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL)

Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean, (2) Variance and (3) Covariance between individuals for this quantitative trait X

1. Contribution of the QTL to the Mean (X)

aaAaAAGenotypes

Frequencies, f(x)

Effect, x

p2 2pq q2

a d -a

i

ii xfx

= a(p2) + d(2pq) – a(q2)Mean (X) = a(p-q) + 2pqd

Biometrical model for single biallelic QTL

2. Contribution of the QTL to the Variance (X)

aaAaAAGenotypes

Frequencies, f(x)

Effect, x

p2 2pq q2

a d -a

= (a-m)2p2 + (d-m)22pq + (-a-m)2q2 Var (X)

i

ii xfxVar 2

= VQTL

Broad-sense heritability of X at this locus = VQTL / V Total

Broad-sense total heritability of X = ΣVQTL / V Total

Biometrical model for single biallelic QTL

= (a-m)2p2 + (d-m)22pq + (-a-m)2q2 Var (X)

= 2pq[a+(q-p)d]2 + (2pqd)2

= VAQTL + VDQTL

m

d +a–a

AAaa

Aa

Additive effects: the main effects of individual alleles

Dominance effects: represent the interaction between alleles

d = 0

Biometrical model for single biallelic QTL

= (a-m)2p2 + (d-m)22pq + (-a-m)2q2 Var (X)

= 2pq[a+(q-p)d]2 + (2pqd)2

= VAQTL + VDQTL

AAaa

Aa

Additive effects: the main effects of individual alleles

Dominance effects: represent the interaction between alleles

m

–a +ad

d > 0

Biometrical model for single biallelic QTL

= (a-m)2p2 + (d-m)22pq + (-a-m)2q2 Var (X)

= 2pq[a+(q-p)d]2 + (2pqd)2

= VAQTL + VDQTL

AAaa

Aa

Additive effects: the main effects of individual alleles

Dominance effects: represent the interaction between alleles

m

–a +ad

d < 0

Biometrical model for single biallelic QTL

aa Aa AA

Var (X) = Regression Variance + Residual Variance= Additive Variance + Dominance Variance

m

–a

+a

d

Biometrical model for single biallelic QTL

Var (X) = 2pq[a+(q-p)d]2 + (2pqd)2

VAQTL + VDQTL

Demonstrate

2A. Average allelic effect

2B. Additive genetic variance

NOTE: Additive genetic variance depends on allele frequency p& additive genetic value a

as well as dominance deviation d

Additive genetic variance typically greater than dominance variance

Biometrical model for single biallelic QTL

2A. Average allelic effect (α)

The deviation of the allelic mean from the population mean

a(p-q) + 2pqd

Aaαa αA

? ?Mean (X)

Allele a Allele APopulation

AA Aa aaa d -a

A p q ap+dq q(a+d(q-p))

a p q dp-aq -p(a+d(q-p))

Allelic mean Average allelic effect (α)

1/3

Biometrical model for single biallelic QTL

Denote the average allelic effects - αA

= q(a+d(q-p)) - αa

= -p(a+d(q-p))

If only two alleles exist, we can define the average effect of allele substitution - α = αA - αa - α = (q-(-p))(a+d(q-p)) = (a+d(q-p))

Therefore: - αA

= qα - αa

= -pα

2/3

Biometrical model for single biallelic QTL

2B. Additive genetic variance

The variance of the average allelic effects

2αA

Additive effect

2A. Average allelic effect (α)

Freq.

AA

Aa

aa

p2

2pq

q2

αA + αa

2αa

= 2qα

= (q-p)α

= -2pα

VAQTL= (2qα)2p2 + ((q-p)α)22pq + (-2pα)2q2

= 2pqα2

= 2pq[a+d(q-p)]2 d = 0, VAQTL= 2pqa2

p = q, VAQTL= ½a2

3/3

αA = qα

αa = -pα

Biometrical model for single biallelic QTL

2B. Additive genetic variance 2A. Average allelic effect (α)

3. Contribution of the QTL to the Covariance (X,Y)

2. Contribution of the QTL to the Variance (X)

1. Contribution of the QTL to the Mean (X)

Biometrical model for single biallelic QTL

i

iiYiXi yxfyxYXCov ,),(

AA

Aa

aa

AA Aa aa(a-m) (d-m) (-a-m)

(a-m)

(d-m)

(-a-m)

(a-m)2

(a-m)

(-a-m)

(d-m)

(a-m)

(d-m)2

(d-m)(-a-m) (-a-m)2

3. Contribution of the QTL to the Cov (X,Y)

Biometrical model for single biallelic QTL

i

iiYiXi yxfyxYXCov ,),(

AA

Aa

aa

AA Aa aa(a-m) (d-m) (-a-m)

(a-m)

(d-m)

(-a-m)

(a-m)2

(a-m)

(-a-m)

(d-m)

(a-m)

(d-m)2

(d-m)(-a-m) (-a-m)2

p2

0

0

2pq

0 q2

3A. Contribution of the QTL to the Cov (X,Y) – MZ twins

= (a-m)2p2 + (d-m)22pq + (-a-m)2q2 Covar (Xi,Xj)

= VAQTL + VDQTL

= 2pq[a+(q-p)d]2 + (2pqd)2

Biometrical model for single biallelic QTL

AA

Aa

aa

AA Aa aa(a-m) (d-m) (-a-m)

(a-m)

(d-m)

(-a-m)

(a-m)2

(a-m)

(-a-m)

(d-m)

(a-m)

(d-m)2

(d-m)(-a-m) (-a-m)2

p3

p2q

0

pq

pq2 q3

3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring

Biometrical model for single biallelic QTL

e.g. given an AA father, an AA offspring can come from either AA x AA or AA x Aa parental mating types

AA x AA will occur p2 × p2 = p4

and have AA offspring Prob()=1

AA x Aa will occur p2 × 2pq = 2p3q

and have AA offspring Prob()=0.5

and have Aa offspring Prob()=0.5

therefore, P(AA father & AA offspring) = p4 + p3q

= p3(p+q)

= p3

Biometrical model for single biallelic QTL

AA

Aa

aa

AA Aa aa(a-m) (d-m) (-a-m)

(a-m)

(d-m)

(-a-m)

(a-m)2

(a-m)

(-a-m)

(d-m)

(a-m)

(d-m)2

(d-m)(-a-m) (-a-m)2

p3

p2q

0

pq

pq2 q3

= (a-m)2p3 + … + (-a-m)2q3 Cov (Xi,Xj)

= ½VAQTL= pq[a+(q-p)d]2

3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring

Biometrical model for single biallelic QTL

AA

Aa

aa

AA Aa aa(a-m) (d-m) (-a-m)

(a-m)

(d-m)

(-a-m)

(a-m)2

(a-m)

(-a-m)

(d-m)

(a-m)

(d-m)2

(d-m)(-a-m) (-a-m)2

p4

2p3q

p2q2

4p2q2

2pq3 q4

= (a-m)2p4 + … + (-a-m)2q4 Cov (Xi,Xj)

= 0

3C. Contribution of the QTL to the Cov (X,Y) – Unrelated individuals

Biometrical model for single biallelic QTL

Cov (Xi,Xj)

3D. Contribution of the QTL to the Cov (X,Y) – DZ twins and full sibs

¼ genome

¼ (2 alleles) + ½ (1 allele) + ¼ (0 alleles)

MZ twins P-O Unrelateds

¼ genome ¼ genome ¼ genome

# identical alleles inherited from parents

01(mother)

1(father)

2

= ¼ Cov(MZ) + ½ Cov(P-O) + ¼ Cov(Unrel)

= ¼(VAQTL+VDQTL

) + ½ (½ VAQTL) + ¼ (0)

= ½ VAQTL + ¼VDQTL

Biometrical model for single biallelic QTL

Biometrical model predicts contribution of a QTL to the mean, variance and covariances of a trait

Var (X) = VAQTL + VDQTL

1 QTL

Cov (MZ) = VAQTL + VDQTL

Cov (DZ) = ½VAQTL + ¼VDQTL

Var (X) = Σ(VAQTL) + Σ(VDQTL

) = VA + VDMultiple QTL

Cov (MZ)

Cov (DZ)

= Σ(VAQTL) + Σ(VDQTL

) = VA + VD

= Σ(½VAQTL) + Σ(¼VDQTL

) = ½VA +

¼VD

Summary

Biometrical model underlies the variance components estimation performed in Mx

Var (X) = VA + VD + VE

Cov (MZ)

Cov (DZ)

= VA + VD

= ½VA + ¼VD

Summary

Path Analysis

HGEN502, 2011

Hermine H. Maes

Model Building

Write equations for means, variances and covariances of different type of relativeor

Draw path diagrams for easy derivation of expected means, variances and covariances and translation to mathematical formulation

Method of Path Analysis

Allows us to represent linear models for the relationship between variables in diagrammatic form, e.g. a genetic model; a factor model; a regression model

Makes it easy to derive expectations for the variances and covariances of variables in terms of the parameters of the proposed linear model

Permits easy translation into matrix formulation as used by statistical programs

Path Diagram Variables

Squares or rectangles denote observed variables

Circles or ellipses denote latent (unmeasured) variables

Upper-case letters are used to denote variables Lower-case letters (or numeric values) are used

to denote covariances or path coefficients

Variables

latent variables

observed variables

Path Diagram Arrows

Single-headed arrows or paths (–>) are used to represent causal relationships between variables under a particular model - where the variable at the tail is hypothesized to have a direct influence on the variable at the head

Double-headed arrows (<–>) represent a covariance between two variables, which may arise through common causes not represented in the model. They may also be used to represent the variance of a variable

Arrows

double-headed arrows

single-headed arrows

Path Analysis Tracing Rules

Trace backwards, change direction at a 2-headed arrow, then trace forwards (implies that we can never trace through two-headed arrows in the same chain).

The expected covariance between two variables, or the expected variance of a variable, is computed by multiplying together all the coefficients in a chain, and then summing over all possible chains.

Non-genetic Example

Cov AB

Cov AB = kl + mqn + mpl

Expectations

Cov AB = Cov BC = Cov AC = Var A = Var B = Var C =

Expectations

Cov AB = kl + mqn + mpl Cov BC = no Cov AC = mqo Var A = k2 + m2 + 2 kpm Var B = l2 + n2

Var C = o2

Genetic Examples

MZ Twins Reared Together DZ Twins Reared Together MZ Twins Reared Apart DZ Twins Reared Apart Parents & Offspring

MZ Twins Reared Together

Expected Covariance

Twin 1 Twin 2

Twin 1 a2+c2+e2

variance

a2+c2

Twin 2 a2+c2

covariance

a2+c2+e2

MZ Twins RT

DZ Twins Reared Together

Expected Covariance

Twin 1 Twin 2

Twin 1 a2+c2+e2 .5a2+c2

Twin 2 .5a2+c2 a2+c2+e2

DZ Twins RT

MZ Twins Reared Apart

DZ Twins Reared Apart

Twins and Parents

Role of model mediating between theory and data