How large-scale genomic evaluations are...

41
How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Transcript of How large-scale genomic evaluations are...

Page 1: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How large-scale genomic evaluations are possible

Daniela Lourenco

05-24-2018

Page 2: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How big is your genomic data?

500,000

http://www.angus.org/AGI/default.aspx 2,000,000

http://www.holsteinusa.net/programs_services/backgrounds.htmlhttps://www.usjersey.com/AJCA-NAJ-JMS/AJCA/AnimalIdentificationServices/HerdRegister.aspx

255,000hLp://sesenfarm.com/raising-pigs/

250,000

17 Gb

15 Gb 26 Gb

130 Gb

2

Page 3: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Large-scale data

Can you use the same

methods?

yes

no

Data is not largeBut it may grow

Do you need new

algorithms?

yes

large-scale problems

large-scale solutions

Do you have big data?

3

Page 4: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How large is your genomic data?

large-scale problems

large-scale solutions

Large-scale data§ 150,000§ >2 hours§ > 700Gb RAM

HssGBLUP

4

Page 5: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Solution for large-scale evaluations

Problem to invert A back in 1963

BLUP MME was “sleeping”

1976

1963

1988J. Dairy Sci.

5

Page 6: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Solution for large-scale Genomic evaluations

§ Recursions for A-1

Henderson (1976); Quaas (1988)

1 1( ) ' ( )- -= - -A I P M I P

§ Recursions for G-1

§ Split genotyped animals into core and non-core

1 2 1" "

| , ,..., + i i ij j ij proven

u u u u p u e-=

= åcore

Misztal et al. (2014)Misztal (2016) 6

APY - Algorithm for Proven and Young

Page 7: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Algorithm for Proven and Young (APY)

core

non-core

7

Page 8: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Algorithm for Proven and Young (APY)

è

G G-1

APY G

è

APY G-1

8

§ APY G-1 sparse§ Efficient computation§ Why does it work?

Page 9: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

APY and dimension of G

# genotyped animals > # SNP

G = ⍺G + (1-⍺)A22 VanRaden (2008)

G has a limited dimensionality

Dimension of G = min (#animals, # independent SNP, Me)

independent blocks

Dependent blocks

9

Page 10: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

APY and dimension of G

E(Me) = 4 NeLStam (1980)

Ne ?Dimension of G

Number of core animals10

Page 11: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?

§ Simulated popula-ons§ Ne = 20, 40, 80, 120, 160, 200§ #genotyped animals = 75,000

§ Dimensionality of G as number of largest eigenvalues of G

§ #eigenvalues vs. Ne

§ #eigenvalues as #core animals in APY ssGBLUP

§ cor(TBV,GEBV_APY) vs. cor(TBV,GEBV) 11

Page 12: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?

12

Page 13: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?

Regular G-1

NeL 2NeL 4NeL

13APY G-1

Page 14: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?

Dimensionality of G depends on Ne

# largest eigenvalues 98%

Number of core animals≥ accuracy as

regular G-114

Page 15: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?

Real Livestock Populations

77k gen61k SNP10M ped

75k gen61k SNP

2.5M ped

81k gen38k SNP8M ped

23k gen37k SNP2.5M ped

16k gen39k SNP200k ped

15

Page 16: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?# largest eigenvalues of G explaining 98% ~ 99% variance

14k ~ 19k

11k ~ 16k

11k ~ 14k

4k ~ 6k

4k ~ 6k

16

Page 17: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?

eigen98%=

14k

77k

14k 63k

Cor(GEBV,GEBV_APY) > 0.99

17

Random choice

Page 18: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?What happens if more animals are genotyped?Should we change the core number over time?

18

Optimum APY G-1

Regular G-1

4.5k

8k

14k 19k

95% APY G-1

Page 19: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

How many core animals in APY?

Core Cor (G-1, G-1 APY) Genotyped

Random 10% 0.98Oldest 10% 0.93Youngest 10% 0.93

§ Ostersen et al. (2016)

§ 21k genotyped pigs

# core animals < ideal number from Pocrnic et al. (2016) # weak links to recent populaLon# core has no phenotypes 19

Page 20: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Which core animals in APY?Bradford et al. (2017)

§ Simulated populations (QMSim; Sargolzaei and Schenkel, 2009)

§ Ne = 40§ #genotyped animals = 50,000

§ Core animals:§ Random gen 6 || gen 7 || gen8 || gen9 || gen 10 (y)§ Random all generations§ Incomplete pedigree§ Genotypes in gen 9 and 10 imputed with 98% accuracy

20

Page 21: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Which core animals in APY?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

98% 95% 90%

Corr

elat

ion

(GEB

V, T

BV)

Percent of varia7on explained in G

Accuracy

Gen 6 Gen 7 Gen 8 Gen 9 Gen 10 Random

G-1

21

Page 22: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Which core animals in APY?

0

100

200

300

400

500

600

700

G Gen 6 Gen 7 Gen 8 Gen 9 Gen 10 Random

Rounds to Convergence

22

Page 23: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Which core animals in APY?

Imputation with 2% error

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Complete data 100% young imputed

Corr

ela'

on (E

BV, T

BV)

Accuracy

G Gen 6 Gen 7 Gen 8 Gen 9 Gen 10 Random

23

Page 24: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Which core animals in APY?

80% genotyped animals with missing pedigree

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Complete data 80% genotyped

Corr

elat

ion

(EBV

,TBV

)

Accuracy

G Gen 6 Gen 7 Gen 8 Gen 9 Gen 10 Random

24

Page 25: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Which core animals in APY?

If (sire != 0 .and. dam != 0) then

core = any definition

else

core = random

all generations represented25

Page 26: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Do I need APY?

If (number_genotyped > 50,000) then

APY = True

else

APY = False

26

• APY is just an algorithm to construct G-1 when inverting G is computationally not feasible

Page 27: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

27

How fast is APY?

0.97

0.99 0.99 0.99

5k 10k 15k 20k

Corr

elat

ion

(GEB

V,GE

BV_a

py)

PWG 82k – Random Core

28 min

Regular inversion = 213 min

16 Gb

230 GbLourenco et al., 2015

Page 28: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Largest evaluations with APY?

§ American Angus

§ 500k genotyped animals§ 19k core § all traits § ~ 2 hour (G-1 APY)

UGA & Collaborators

§ US Holsteins

§ ~500k genotyped animals§ several traits

Spring/2017

Fall/2016

28

Page 29: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Largest evaluations with APY?

§ US Holsteins

§ 760k genotyped animals

§ 14k core

§ 23M pedigree

§ 37M phenotypes

§ M / F / P

§ ~ 74Gb RAM

29

Masuda et al. (2016)

Page 30: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

APY with 2M genotyped animals?

§ Should we include all genotyped animals?

§ US Holsteins 2M

§ > 75% female LD + imputation + missing ped?

30

§ Is it feasible?

§ 14k core

§ G-1

= 29 Tb vs. APY G-1

= 208 Gb

Page 31: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Should we include all genotyped animals?

Cooper et al. (2015)

0.63 0.62

0.58

22k bulls + 31k cows + 500k others

Relia

bilit

y

Final Score - US Holsteins

Masuda et al. (unpublished)

31

Match G and A22

Page 32: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

32

What happens with A22 in APY?

!"#$%&

• Fragomeni et al. (2015; unpublished): APY does not work for A22

• Masuda et al. (2017): Rules for inversion of a partitioned matrix

INTRODUCTION

Single-step genomic BLUP (ssGBLUP) is a uni-

MME) in ssGBLUP

(G -122-A

-

G

G122-A

-

122-A

-PCG

122-A

122-A q;

Technical note: Avoiding the direct inversion of the numerator relationship matrix for genotyped animals in single-step genomic best

linear unbiased prediction solved with the preconditioned conjugate gradient1

Y. Masuda,*2 I. Misztal,* A. Legarra,† S. Tsuruta,* D. A. L. Lourenco,* B. O. Fragomeni,* and I. Aguilar‡

ABSTRACT:-

1

22

-A ) q -

A

A

1

22

-A q

1

22

-A

1

22

-A q A22

-

-1

22

-A is no

Key words:

© 2017 American Society of Animal Science. All rights reserved

1

2

Published January 5, 2017'((%& = '(( − '(& '&& %&'&(

Page 33: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

33

Other options for big data• Angus data• 500,000 genotyped animals• 54,000 SNP

• !"# is a 500,000×500,000 matrix • ())"# is a 500,000×500,000 matrix• SNP-BLUP *+* is a 54,000×54,000

• Indirect representations of G• Sherman-Woodbury inversions

! = ./ 0 + **′ and !". = .

3 0 −.3 *

.3 *

+* + 0".*+ .3

Page 34: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

34

ssGTBLUP – Mantysaari et al. (2017)

• Can use reduced Z by single value decomposition

Page 35: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

35

ssGTBLUP – Mantysaari et al. (2017)

APY30K 8 +mes longer than with AM

Page 36: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

36

ssGTBLUP – Mantysaari et al. (2017)

Page 37: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

37

ssGBLUP with marker effects

• Legarra & Ducrocq 2012: ssGBLUP model on SNP effects (!) and BV (#)

Non genotyped animals

Marker effectsMatrices ZW in this model get very complicated for complex

models because they involve very intense products

Page 38: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

38

Single-step with marker effects

• Rediscovered by Fernando et al. (2016)

• Super hybrid model

Non genotyped animals

Marker effects

Page 39: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

39

Single-step with marker effects

Bolt

Page 40: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Large-scale genomic evaluations?

§ Limited dimensionality of genomic information

§ APY ssGBLUP§ ui |

§ Number of core depends on Ne § # eigenvalues 98% = #core§ Computing cost greatly reduced§ Used for commercial large-scale genomic evaluation

40

Page 41: How large-scale genomic evaluations are possiblence.ads.uga.edu/...media=lourenco_pag2017_day14.pdf · How large-scale genomic evaluations are possible Daniela Lourenco 05-24-2018

Large-scale genomic evaluations?

§ Large-scale genomic evaluations§ Problem only for 1% of the users

41

§ Currently, at least 3 different solu?ons§ Blupf90§ MiX99§ Bolt

§ The exact strategy may depend on the problem

§ Maybe in 10 years all animals are genotyped§ Old data is forgotten