VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic...

31

description

Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic information ‘Training set’ of animals phenotyped and genotyped representative of industry Predictor Over-specified – e.g variables, 1000 individuals Robust model selection required Application Predict in selection candidates –Maybe no phenotypes –Maybe no pedigrees

Transcript of VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic...

Page 1: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 2: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

VISG – LARGE DATASETS

Literature Review

Page 3: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Introduction – Genome Wide Selection

Aka Genomic SelectionSet of Markers

• 10,000’s - enough to capture most genetic information

‘Training set’ of animals• phenotyped and genotyped• representative of industry

Predictor• Over-specified – e.g. 10000 variables, 1000 individuals• Robust model selection required

Application• Predict in selection candidates

– Maybe no phenotypes– Maybe no pedigrees

Page 4: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Introduction – Genome Wide SelectionPrediction Methods

• Stepwise Regression• gBLUP

– Fit all markers as a random effect– gi ~ N(0,g

2)

• BayesA– gi ~ N(0,gi

2)– prior : gi

2~ S/2 (choose S and )

• BayesB– similar to BayesA, except– proportion of effects are zero

Most investigations compare theseMany variations (sometimes with the same name)

Page 5: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Literature

Dairy applications review (Hayes et al., 2009)GWS in crops (Heffner, Sorrells, Jannick, 2009)Prediction in unrelateds (Meuwissen, 2009) Marker panels (Habier, 2009)Phenotypes (Harris & Johnson, unpub)+ ...

Page 6: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Issues

National evaluationsLong term gainsLD or relationship trackingMultiple breedsDistance from Training to ApplicationMarker Panels (subsets)Phenotypes (EBV-based)Non-additive effectsComputing requirements

Page 7: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Methods

gBLUP almost as good as Bayes(A) (dairy)• Interpretation(?): many genes of small effect

Bayes methods better at using real LD (vs relatedness)Bayes(B) advantage greater with

• Higher marker density• Higher Training Application distance• Smaller Training set

Mixture of 2 normals ~ BayesBPartial Least SquaresMachine LearningHaplotype methods not used in practice yet

Page 8: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Marker Panels

Evenly spaced panels• Track inheritance from parents (both SNP-chipped)• Will work with new traits

Lasso methods popular• Shrinks small effects to zero

Page 9: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Other

Combining marker and other information• Phenotype info, parent info• Index methods; ‘blending’• Important for seamless national evaluations

Computing strategies• Tricks to reduce computation• Approximation rather than Iterative (MCMC) methods

Page 10: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Online resources

Conferences• Statistical Genetics of Livestock for the Post-Genomic Era.

UW-Madison, May, 2009. http://dysci.wisc.edu/sglpge/index.html

• QTL/MAS Workshops. 2008: http://www.computationalgenetics.se/QTLMAS082009: http://www.qtlmas2009.wur.nl/UK/

Courses• Whole Genome Association and Genomic Selection.

September 1-8, 2008, Salzburg, Austria. http://www.nas.boku.ac.at/12100.html?&L=0

•  Use of High-density SNP Genotyping for Genetic Improvement of Livestock . Iowa State, June, 2009. http://www.ans.iastate.edu/stud/courses/short/

Page 11: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 12: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 13: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 14: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 15: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 16: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 17: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 18: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 19: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 20: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 21: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 22: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 23: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 24: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 25: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 26: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Page 27: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Toy example

• 5 SNP / 1000 individuals• y = mu + SNP1 + e

– mu = 10– SNP1 substitution effect = 10 / p = 0.5– Var(e) = 1

• 1 block / 1000 iterations• Runs in ~ 5 secs

Page 28: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

200 400 600 800 1000

02

46

810

iteration

b.sa

mp

200 400 600 800 1000

1.0

1.5

2.0

2.5

3.0

iteration

z.sa

mp

200 400 600 800 1000

89

1011

12

iteration

mu.

sam

p

200 400 600 800 1000

02

46

810

14

iteration

var.b

.sam

p

Page 29: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

0 2 4 6 8 10

0.0

0.4

0.8

b 1

N = 900 Bandw idth = 0.09915

Den

sity

0 2 4 6 8 10

05

1015

20

b 2

N = 900 Bandw idth = 0.02

Den

sity

0 2 4 6 8 10

020

4060

b 3

N = 900 Bandw idth = 0.00561

Den

sity

0 2 4 6 8 10

05

1020

b 4

N = 900 Bandw idth = 0.01463D

ensi

ty

0 2 4 6 8 10

0.0

0.5

1.0

1.5

b 5

N = 900 Bandw idth = 0.2309

Den

sity

3

z 1

020

060

0

1 2

z 2

020

060

0

1 2

z 3

020

060

0

1 2 3

z 40

200

600

1

z 5

020

060

0

8 9 10 11 12

0.0

0.2

0.4

mu

N = 900 Bandw idth = 0.1564

Den

sity

0 2 4 6 8 10 12 14

0.00

0.10

0.20

var.b 2

N = 900 Bandw idth = 0.3301

Den

sity

Page 30: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

iteration

p

0.5 0.6 0.7 0.8 0.9 1.0

01

23

45

6

p 1

N = 900 Bandwidth = 0.0169

Den

sity

0.0 0.1 0.2 0.3 0.4 0.5

05

1015

20

p 2

N = 900 Bandwidth = 0.007388

Den

sity

0.0 0.1 0.2 0.3 0.4

02

46

8

p 3

N = 900 Bandwidth = 0.01327

Den

sity

Page 31: VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.