Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research,...

25
Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain [email protected] [email protected]
  • date post

    23-Jan-2016
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research,...

Page 1: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Association genetics in forest trees

Santiago C. González-Martínez

Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain

[email protected]@inia.es

Page 2: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

SNP 1 SNP 2 SNP 3 Trait 1 Trait 2AT GT CC 1 3AT GT GG 10 4TT TT GG 10 7AA GG CG 5 1AA GG CG 5 3AT TT CG 5 5AA GT CC 1 2TT GT GG 10 8TT TT GG 10 7TT TT CC 1 10AT GG CG 5 6AA GT CG 5 4AA TT CG 5 1TT TT GG 10 9AT GG CC 1 6AT GT CG 5 4AT GG CC 1 5AA GG CC 1 2AA GG GG 10 1TT GT GG 10 8

f(AA)=0.35 f(GG)=0.35 f(CC)=0.30f(AT)=0.35 f(GT)=0.35 f(CG)=0.45f(TT)=0.30 f(GG)=0.30 f(GG)=0.25

Trait 1 Trait 1 Trait 1u(AA)=32/7=4.57 u(GG)=28/7=4.00 u(CC)=6/6=1.00u(AT)=23/6=3.83 u(GT)=42/7=6.00 u(GC)=35/7=5.00u(TT)=51/6=8.50 u(TT)=41/6=6.83 u(GG)=70/7=10.00

Trait 2 Trait 2 Trait 2u(AA)=14/7=2.00 u(GG)=24/7=3.43 u(CC)=28/6=4.66u(AT)=33/7=4.71 u(GT)=33/7=4.71 u(GC)=24/7=3.43u(TT)=49/6=8.16 u(TT)=41/6=6.83 u(GG)=44/7=6.28

What is association genetics?

Page 3: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Linkage versus Association: finding the molecular variation underlying complex traits

several generations

X

X

XX

X

X

X

Natural population (= multiple genetic backgrounds)

Mapping pedigree

A favourable mutation

LG

Page 4: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

For which organisms genetic association is a promising approach?

• Relatively undomesticated species with outbred mating systems and large natural populations.

• Organisms with long life-spam, where generating pedigrees would take several years.

• Organisms (such as humans) where artificial crosses are not possible or are difficult to obtain (incompatible species).

• In plants: opportunity to test for genetic association of multiple traits and phenotypes: long-term common garden experiments (including clonal tests high precision in the estimation of phenotypes).

The ‘immortal’ association population

Page 5: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Linkage disequilibrium and association

Stumpf & McVean (2003)Nature Reviews Genetics

a)

b)

c)

Rapid decay of LD in conifers, but LD might be stronger in regions under selection (example: LD extends over 800 kb around Y1 gene in maize, Palaisa et al. 2004, which in general shows also a rapid decay of LD with physical distance, Remington et al. 2001)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 500 1000 1500 2000 2500 3000 3500

distance (base pairs)

r2

Picea abies all

P. abies without Romania

Baltico-Nordic domain

Alpine domain

Heuertz et al. 2006Genetics

Page 6: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Extend of LD and association: higher LD makes easier to detect associations but more difficult to identify the causal mutations

Variation among genes

Variation among species

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 500 1000 1500 2000 2500 3000 3500

distance (base pairs)

r2

Picea abies all

P. abies without Romania

Baltico-Nordic domain

Alpine domain

conifers

humans

Stumpf & McVean (2003)Nature Reviews Genetics

Page 7: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Based on Yu & Buckler (2006)Current Opinion in Biotechnology

GLMGC

Approaches to genetic association in plants

Familial relatedness

Po

pula

tion

str

uct

ure

SAGC

GLMGCMLM

MLMTDTQTDT

unknown

Natural populations

Breeding populations

Complex demography

Page 8: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Power considerations: the size of an association population

% variation explained by QTN

Po

we

r

Long & Langley (1999)Genome Research

A single random mating population with mutation, random genetic drift, and recombination

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

N=500 N=100 N=50

Page 9: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Hirschhorn & Daly 2005Nature Reviews Genetics

Increased rate of false-positives due to population structure…

Zhao et al. (2007)PLoS Genetics

…but correcting for pop structure produces true negatives!

haplotypes

Moroccan

Western

Eastern

maritime pine

ab

c

haplotypes

Moroccan

Western

Eastern

maritime pine

ab

c

Multiple glaciar refugiaMultiple glaciar refugia

Drought cline

Postglacial migrations

Page 10: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Power considerations: structured populations

Zhao et al. (2007)PLoS Genetics

% variation explained by QTN

Po

we

r

(Small association pop of ~100 accessions)

Page 11: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Methods for genetic association in forest trees

• Standard general linear models (GLMs), usually with p values computed by permutation.

y = + mi + eij, where y is the trait value, is a general

mean, mi is the genotype of the i-th SNP and eij is the residual.

• Structured Association (Pritchard et al. 2000; Thornsberry 2001) and PCA Association (Price et al. 2006).

Controls for population structure by incorporating a Q matrix. This matrix is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined.

• Mixed Linear Models (MLMs; Yu et al. 2006).

They incorporate a Q matrix (fixed effect) but also a pairwise relatedness matrix (K matrix, a random effect), which account for within population structure.

Page 12: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

• Family-based methods (Transmission Disequilibrium Test, TDT or QTDT, and its several extensions).

Parents must be heterozygous to be informative.

From few to moderate genetic backgrounds tested.

FBRC association population in loblolly

pine

González-Martínez et al. (2008) Heredity

Partial diallel, including 15-24 offspring from 61 families. Association with WUE (isotope discrimination in two sites)

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Genotype by family for DHN1-S2

Tra

it

Page 13: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Corrections for multiple testing

• Experiment-wise permutation

• Bonferroni (/k, with k = the number of tests)

• False Discovery Rate (FDR)

Storey & Tibshirani (2003)PNAS

FDR: the expected proportion of false positives among all significant tests

Page 14: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Permutation tests (Hirschhorn and Daly 2005)

Page 15: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Some examplesMonolignol biosynthesis

and cell-wall related genesGonzález-Martínez et al. (2007)

Genetics

Drought tolerance Collada et al. (in prep.)

Page 16: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Pinus taeda L

Continuous range, no clear population genetic structure

Fragmented range, significant population structure

TREESNIPS project (also P. sylvestris, Picea abies and oaks)

ADEPT project

Tamrabta(30)

TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)

CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca

(25)San Leonardo de Yagüe

Valdemaqueda(24)(21)Arenas de San Pedro

(27)San Cipriano

(40)Petrock

(43)Le Verdon

Olonne/Mer(44)

(42)Hourtin

(41)Mimizan

Cenicientos(20) Ahin(28)

St Jean de Monts(45)

(46) Pleucadec

(11)Pineta (10)Aulenne

Restonica (2)Pinia (15)

(29)Oria

(47)Erdeven

Pinus pinastergeographicrange France

Spain

Tunisia

Portugal

Morocco

Tamrabta(30)

TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)

CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca

(25)San Leonardo de Yagüe

Valdemaqueda(24)(21)Arenas de San Pedro

(27)San Cipriano

(40)Petrock

(43)Le Verdon

Olonne/Mer(44)

(42)Hourtin

(41)Mimizan

Cenicientos(20) Ahin(28)

St Jean de Monts(45)

(46) Pleucadec

(11)Pineta (10)Aulenne

Restonica (2)Pinia (15)

(29)Oria

(47)Erdeven

Pinus pinastergeographicrange France

Spain

Tunisia

Portugal

Morocco

Tamrabta(30)

TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)

CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca

(25)San Leonardo de Yagüe

Valdemaqueda(24)(21)Arenas de San Pedro

(27)San Cipriano

(40)Petrock

(43)Le Verdon

Olonne/Mer(44)

(42)Hourtin

(41)Mimizan

Cenicientos(20) Ahin(28)

St Jean de Monts(45)

(46) Pleucadec

(11)Pineta (10)Aulenne

Restonica (2)Pinia (15)

(29)Oria

(47)Erdeven

Pinus pinastergeographicrange France

Spain

Tunisia

Portugal

Morocco

22 populations

Pinus pinaster Ait.

Page 17: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Phenotypic traits

S1

S2

S3

2o wall

1o wall

microfibrilangle

• Earlywood specific gravity (ewsg)• Latewood specific gravity (lwsg)• Percent latewood (lw)• Earlywood microfibril angle (ewmfa)• Lignin & cellulose content (lgn-cel)

• Synthetic PCAs for different wood-age types

SNP genotyping

0

50

100

150

200

0 50 100 150X axis

R110 (mP)

Y a

xis

TA

MR

A (

mP

)

FP-TDI platform 58 SNPs from 20 wood- and drought- related candidate genes.

Genetic association with wood property traits

González-Martínez et al. 2007 Genetics

Page 18: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Significant genetic association of cad gene with earlywood specific gravity and 4cl with %

latewood

SNP M28 (position 16 bp)

10 180* *

MGSLESEKTV […] SPMKHFGMTEP

10 180* *

MGSLETEKTV […] SPMKHFAMTEP

cynnamyl alcohol dehydrogenase (cad)

M28

T G G A GT T G A GT G G A A

A G C G G AAA G C G G

M29

Tested but not givingsignificant associations

SNP M28 (position 16 bp)

10 180* *

MGSLESEKTV […] SPMKHFGMTEP

10 180* *

MGSLETEKTV […] SPMKHFAMTEP

cynnamyl alcohol dehydrogenase (cad)

M28

T G G A GT T G A GT G G A A

A G C G G AAA G C G G

M29

Tested but not givingsignificant associations

0 500 1000 1500 2000 2500

1

994

1410

1609

1697

1845

1934

2004

2385

2589

F4 R4 F3 R3 F2 R1A61 601 947 1454 1486 2003

F5 R3 F6 R6491 1956 2728

0 500 1000 1500 2000 2500 2500 3000 3500

-60 90 208 321 781 1008 1133 1417 1528 1681 3192 328490

F1A R1A F2 R2 F3 R3F6 R6

4cl

cad

Page 19: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Provenance-progeny combined tests in two sites: Cálcena (central Spain) & Bordeaux (southwestern France)

• Isotope discrimination (WUE)• Growth (height, diameter, annual increments)• Biomass (total and aerial)• Ontogeny scores• Survival

Genetic association with WUE

Phenotypic traits

SNP genotyping

Pyrosequencing Relatively high genotyping error.

Collada et al. (in prep.)

Page 20: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Central/marginal pairs

C - - - - - - - - - - - - - - - - - - - T T t C c A t c C c A g t A T G A T A T T C C G G T Pinus taedaT c T C C A T G G C G G A C A C a T A C T T T C T G T T C C G T C T T G G C T C T C C A C T 1C c T C C A T G G C G G A C A C a T A C T T T C T G T T C C G T C T T G G C T C T C C A C T 6C C T C C A T G G C G G A C A C A T A C T T T C T G T T C C G T C T T G G C T C T C C A C T 5T A T T T A C G G C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 10C A T T T A C G G C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 29C A T T T A C G A C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 1C A T T T G C G G C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 2C A T T T A C G G T G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 1C A T T T A C G G C G G A C A T A T A C C C T T C A G T C C G T C A C A T T A C T C T G G T 1C C G T T A C G G C G G A C A T A T A C C C T T C A G T C C G T C A C A T T A C T C T G G T 1C C T C C A T A G C G T G A G C A T A C T T A C C A T C T C A G T A C G T T A C T C T G G T 1FRD13C

pr-agp4 470bp 0.14691062bp 0.0009991069bp 0.000999

dhn1 116bp 0.013171bp 0.2188

ccoaomt 1229bp 0.0699erd3 92bp 0.4256dhn2 248bp 0.4286

254bp 0.2927259bp 0.3646293bp 0.3457

lp3-3 43bp 0.460569bp 0.737375bp 0.027

223bp 0.3377267bp 0.9071272bp 0.4366

rd21 3bp 0.7313

BLUEs (pop effect removed)

-0.20

-0.15

-0.10

-0.05

0.00

0.05

0.10

0.15

0.20

Iso

top

e d

iscr

imin

atio

n

TT GT GG

Average for TT: 0.0034Average for GT: -0.0407

agp4GLMs, population as a factor

Page 21: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Tassel demo

Page 22: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

R SNPassoc package demo

Page 23: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Perspectives on genetic association in forest trees

• Enormous potential, but still many technical challenges ahead: optimization of SNP genotyping platforms, dealing with recently evolved gene families, building large unstructured association populations, transfer information to non-model species, etc.

• Linking genotype-phenotype through association genetics works well for well-known metabolic pathways, and for some species such as loblolly pine genome-wide approaches are now in place. As large-scale association studies are developed, more complex questions will be addressed: gene interactions, heterosis, plasticity (G x E), etc.

• Apart from industry applications, given the ecosystem-wide importance of forest trees, genetic association will have a strong influence in evolutionary and ecological research.

Page 24: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Absence of transpecific SNPs between P. pinaster and P. taeda, two pine species separated by ~120 Myr

Lp3_3 pinaster F1 R10 185 352 406

nt_4

3nt

_44

nt_5

5nt

_59

nt_6

4nt

_65

nt_6

6nt

_67

nt_6

8nt

_69

nt_7

0nt

_71

nt_7

2nt

_73

nt_7

4nt

_75

nt_7

6nt

_77

nt_8

1nt

_85

nt_8

7nt

_91

nt_9

7nt

_106

nt_1

15nt

_127

nt_1

34nt

_143

nt_1

56nt

_158

nt_1

61nt

_188

nt_1

96nt

_198

nt_1

99nt

_200

nt_2

01nt

_204

nt_2

23nt

_235

nt_2

36nt

_246

nt_2

67nt

_272

nt_2

98nt

_318

nt_3

19nt

_330

nt_3

63

Hap_1 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A C

Hap_2 C G C G G G A G G T G A A G A G T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C

Hap_3 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C

Hap_4 C G C G G G A G G A G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T C T A A G A T A C

Hap_5 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C

Hap_6 T G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C

Hap_7 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A T

Hap_8 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C A G A T A C

Hap_A C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C

Hap_B C T T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C

Hap_C C G T A - - - - - - - - - - - - C A T T C T T A G T A G A A A - T A - - - T T C T C A A G A C G C

Hap_D C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G G C G C

P.pinaster

P.taeda

ABA-and-WDS-induced-gene-3 (lp3-3)

P. pinaster

P. taeda

Average Ks between P. pinaster and P. taeda of ~2%

Page 25: Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es.

Acknowledgements

TREESNIPS (for maritime pine: C. Collada, E. Eveno, M.A. Guevara, A. Booth, A. Soto, C. Plomion, L. Díaz, S. McCallum, I. Aranda, O. Brendel, R. Alía, V. Leger, J. Brach, J. Russell, P.H. Garnier-Géré, M.T. Cervera)

ADEPT & ADEPT2 (N.C. Wheeler, E. Ersoz, G.R. Brown, G.P. Gill, R.J. Kuntz, J.A. Beal, J. Manares, D. Huber, J. Davis, B. Pande, J. Lee, A. Eckert, J. Wegrzyn, C.D. Nelson)

FUNDING AGENCIES (NSF, CSREES-USDA, EU, MEC-Spain)

and, of course, all you!