Introduction to Parameter Estimation

download Introduction to Parameter Estimation

of 52

Transcript of Introduction to Parameter Estimation

Adapted from T.R. Famula Animal Genetics 109

Introduction to Parameter EstimationDr. Fernando Brito Lopes

Goinia GO 2011

1

ContentsIntroduction .................................................................................................................... 2 Expected Value............................................................................................................... 4 Practice and Solution ...................................................................................................... 7 Repeatability (r) .............................................................................................................10 Estimation of repeatability by regression .......................................................................11 Estimate of repeatability by correlation ..........................................................................12 Estimation of repeatability from variance components ...................................................14 Estimation of Heritability by Variance Components.......................................................17 One-Way Classification .................................................................................................17 The Intra-Class Correlation Coefficient..........................................................................18 Two-Way Nested Classification .....................................................................................19 Estimation of Heritability...............................................................................................19 Two-Way Nested Classification .....................................................................................24 Joint Estimation of Heritability and repeatability ...........................................................24 Two-Way Cross Classification Model ............................................................................26 Genetic, Environmental and Phenotypic Correlations .....................................................31 Estimation from components of (co)variances ................................................................31 One-Way Classification .................................................................................................31 Practice and Solution .....................................................................................................36 Exercise .........................................................................................................................41 SAS Programs ...............................................................................................................42 Reference ......................................................................................................................51

2

IntroductionJulius van der Werf The estimation of genetic parameters is an important issue in animal breeding. First of all, estimating additive genetic and possible non-additive genetic variances contributes to a better understanding of the genetic mechanism. Secondly, estimates of genetic and phenotypic variances and covariance are essential for the prediction of breeding values and for the prediction of the expected genetic response of selection programs. Parameters that are of interest are heritability, genetic and phenotypic correlation and repeatability, and those are computed as functions of the variance components. Estimation of heritability is based on methods that determine resemblance between genetically related animals. Roughly, there are two methods that can be used: i) The resemblance between parents and offspring. If we plot the observations on offspring against the values of their parents (either sires, or dams, or their average), we can perform Offspring-Parent regression. The slope of the regression line though is plot reflects how much of the phenotypic differences that we find in parents are retrieved in their offspring. The expected value of the regression line is bOP = 0.5h2. (or h when regression is on midparent mean). Offspring-parent regression is not often used in practice. It requires data on 2 generations, and uses only this data. It is also not able to utilize genetic relationships among parents. However, the method is robust against selection of parents. ii) The estimation of variance components (within and between family components). If the variation within families is large relative to differences between families, the trait must be lowly heritable. Variance components are attributed to specific effects. For example, the (paternal) half-sib variance is due to differences between sires. The variance component represents the sire variance, which is a quarter of the additive genetic variance. Estimation of variance components is easier to generalize, and this method is generally used to estimate genetic parameters. Mostly it is assumed that variances and covariance, and especially the ratio of both of them (like heritability, correlation), are based on particular biological rules, which do not rapidly change over time. However, it is well known that the genetic variance

3 changes as consequence of selection. Changes are especially expected in situations with short generation intervals, high selection intensities or high degrees of inbreeding or in a situation in which a trait is determined by only a few genes. Secondly, the circumstances under which measurements are taken can change. If conditions are getting more uniform over time, the environmental variance decreases, and consequently the heritability increase. Thirdly, the biological interpretation of a trait can change as consequence of a changed environment; feed intake under limited feeding is not the same as feed intake under ad-lib feeding. In conclusion, there are sufficient reasons for regular estimation of (co-)variance components.

4

Expected ValueTeaching selection index and animal breeding techniques is nearly impossible without the use of a statistical concept called Expected Value. Though difficult to master at first, the rest of this course will be greatly simplified if you always remember to return to the basic of expected value. Recall the basic algebraic model of quantitative genetics: P = G + E. Remember also that the term we wish to improve, G, is unobservable. Only P can be directly measured. Expected value will be the tool we use to relate P to G in our search for efficient and accurate selections aids. The difficult part for students is to master the topic of expected value a concept that is statistical and quite theoretical. To simplify this learning process we will rely on 6 simple rules. If you follow these rules, mastery of expected value is assured. Another term that is used to describe expected value is average value. The average is, after all, what you would expect to see. The symbol most often used to express expected values is E[expression]. In establishing our 6 rules, consider the following definitions of notation. Let c is a constant; X is a random variable with mean x; and variance x; and Y is a random variable with mean y; and variance y; and covariance with X, Rule 1:xy.

E[c] = c. In words, the expected value of a constant is a constant. No surprises here. Similarly E[c] = c, i.e., the expected value of a squared constant is the constant squared.

Rule 2:

E[X] = x. The average value (expected value) of a random variable is its mean. Once again, its not surprise.

Rule 3:

E[cX] = E[c]E[X] = cx. The expected value of a constant time a random variable is the constant times the expected value of the random variable. The point to learn, when taking expectations with constants, the constants can be taken out of the expected value.

5 Rule 4: E[X+Y] = E[X] + E[Y] = x + y. The expectation of a sum can be found as the sum of the expectations. In the language of mathematics, expectation is a linear operator.

Rule 5:

E[X] =

x + x. Unlike the expected value of a squared constant, the

expected value of a squared random variable is not just the mean squared but also includes the variance of the random variable.

The derivation of Rule 5 is based upon the definition of a variance. In words, a variance is the average (when you read average think expected value) squared deviation of a random variable from its mean. Or, symbolically, x = E[(X-x)]. Lets do it. x = E[(X- x)] = E[X 2X x + x] = E[X] E[2X x] + E[x] = E[X] 2xE[X] + x = E[X] 2x(x) + x = E[X] x. (expand) (by Rule 4) (by Rule 1 and 3) (by Rule 2) (a little algebra to get Rule 5)

Note that if (and only if) x = 0, then E[X] = x. Note also that the variance of the random variable X can also be written as x = E[X] {E[X]} = E[X] x.

Rule 6:

E[XY] =

xy

+ xy. Much like Rule 5 for products, Rule 6 show that the

expected value of cross-products is the sum of the covariance plus the product of the two means. The derivation of this rule is similarly based upon the definition covariance; which is the average value of the product of deviations of random variables from their respective means. Or symbolically, E[(Xx)(Yy)]. Lets do it. = E[(X-x) (Yy)] = E[XY X y Yx + xy] = E[XY] E[Xy] E[Xx] +E[xy] = E[XY] yE[X] xE[Y] + xy = E[XY] xx xy + xy (expand) (by Rule 4) (by Rule 1 and 3) (by Rule 2)xy

=

xy

6 = E[XY] xy. (and algebra to get Rule 6)

Note that if either x = 0 or y = 0 then E[XY] =

xy.

Summary of Rules

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

E[c] = c E[X] = x E[cX] = E[c]E[X] = cx E[X+Y] = E[X] + E[Y] = x + y E[X] = x + x E[XY] =xy

+ xy

The next step is to be able to apply these rules to problems in animal breeding. Usually that involves finding the expected value of expressions and formulas that are far more complicated that the six simple ones outline above. For example, what would be E[(Xi)] (i.e., the expected value of the squared of a sum)? In solving such problems it is also important the following 4 steps:

Step 1: Step 2: Step 3: Step 4:

Expand the Function Substitute the Model Do Expectations Term by using the 6 Rules Defined Above Collect the terms

An example, suppose an observation on animal i has the following model: Pi = + Gi + Ei, where is a constant, Gi is the genetic value of this animal and Ei is the environmental contribution to the phenotype. Moreover, Gi and Ei are random variables with G = E = 0, E[Gi] = G, E[Ei] = E andGE

= 0 for any E E and G G

phenotype. For example, what is the mean phenotype of the population (i.e., what is E[Pi])?

7 a) P = E[Pi] = E[ + Gi + Ei] = E[] + E[Gi] + E[Ei] =+0+0 =

b) E[P]

= E[( + Gi + Ei)] = E[ + Gi + Ei + 2Gi + 2Ei + 2EiGi] = E[] + E[Gi] + E[Ei] + E[2Gi] + E[2Ei] + E[2Ei Gi] = + G + E + 0 + 0 + 0 = + G + E

c)

P

= E[Pi] {E[Pi]} = + G + E = G + E

Practice and Solution1 - Let Xi = Q+ei where, Xi is an observation, Q is a constat and ei is a random variable with Qi=0 and variance We2. The covariance between es, cov(e, e)=0. For n=3 e i=1, 2, 3, find the expected values. Show all steps.

a. E(Xi) = E(Q+ei) = E(Q) = Q b. E(Xi2 ) = E[(Q+ei)] = E[(Q+2Qei+ ei)] = E(Q)+E(2Qei )+E(ei) = E(Q)+E(2)E(Q)E(ei)+E(ei) = Q + var(ei) c. E(X1X2) = E(X1).E(X2)+cov(X1, X2) = E(Q1+e1 )(Q2+e2) + 0 = Q1.Q2 = Q2 d. E(Xi Xi) = (para i { i) = E(Q+ei).E(Q+ei ) = Q2 e. E(7Xi) = E(X1+X2+X3) = E(X1)+E(X2 )+E(X3 ) = Q+Q+Q = 3Q

8 f. E(mdia amostral) = E(7Xi/n) = Q g. E[7(Xi - Qx)] = E[3(Xi-Qx)] = 3(Q-Q) = 0 h. E[7(Xi - Qx)2] = E[3(Xi-2XiQx+Qx)] = E(3Xi)-E(6XiQx)+E(3Qx) = 3E(Xi)-6[E(Xi)E(Qx)]+3E(Q x) = 3[Q + var(ei)] - 6QQ + 3Q2 = 3Q + 3var(ei) - 6Q+3Q2 = 3var(ei) - 6Q + 6Q2 = 3var(ei) = var(ei) i. E[7(Xi - Qx)2/n] = E{[3(Xi - 2XiQx + Qx)]/3} = E(Xi - 2XiQx + Qx) = E(Xi) - E(2XiQx) + E(Qx) = E(Xi) - 2[E(Xi)E(Qx)] + E(Qx) = Q + var(ei) - 2(QQ)+Q2 = Q + var(ei) - 2Q+Q2 = var(ei) - 2Q+2Q2 = var(ei)

2 - Let Pij = Q + Gi + eij where, Pij is the jth record of ith animal, Q is a constant, Gi is a random variable with QG = 0 and variance WG2 and eij is a random variable with Qe = 0 and variance We2. The covariance between the es and Gs or G with any e =0. For i = 1, 2, 3 and j = 1, 2, 3, 4, find the expected values. Show all steps. a. E(Gi) = 0 b. E(Pij ) = Q c. E(Gi2) = var(Gi)

9 d. E(eij2) = var(eij) e. E(Pij2) = E(Q+Gi+eij) = E(Q+Gi+eij+2QGi+2Qeij+2Gi eij) = E(Q)+E(Gi)+E(eij)+E(2QGi)+E(2Qeij)+E(2Gi eij) = Q+var(G)+var(eij)+0+0+0 = Q+var(G)+var(eij) f. E(G1 G2) = 0 g. E(e11 e12) = 0 h. E(e11 e21) = 0 i. E(P11 P12) = E[(Q + G1 + e11)(Q + G1 + e12)] =E(QQ)+E(QG1)+E(Qe12 )+ E(G1Q)+E(G1 G1)+E(G1e12)+ E(e11Q)+E(e11 G1)+E(e11 e12) =E(Q)+0+0+0+E(G1)+0+0+0+0 = Q + var(G1 ) j. E[(P11 - QP)2] = E[(Q + G1 + e11 - Q)] = E[(G1 + e11)] = E[G1 + 2G1 e11 + e11] = E[G1] + E[2G1 e11 ] + E[e11] = var(G1 )+var(e11) k. E(Pij2) [E(Pij)]2 = E[(Q + Gi + eij)]-(E[Q + Gi + eij]) = E[Q + Gi + eij + 2QGi + 2Qeij + 2Gi eij] [E(Q) + E(Gi) + E(eij)] = E[Q] + E[Gi ] + E[eij] + 0 + 0 + 0 - Q + 0 + 0 = Q + var(G) + var(e) - Q = var(G)+var(e)

10

Repeatability (r)Repeatability is defined as the ratio of the variance due to animal effects to the total, phenotypic variance. That is, r = A ( A + E). Repeatability is the fraction of the difference from mean in one record which is expected in another record on the same animal. Repeatability thus is the regression coefficient for a subsequent record on a previous and is also the correlation between on the same animal. Model: Pij = + Ai + Eij, where Pij = jth record on ith animal; Ai = effect on Pij of the animal (genetic plus permanent environmental effects), and Eij = random temporary environmental effect.

i = 1, , B B = number of animals

j = 1, , ni ni = number of records on ith animal E[Ai] = A = 0 E[Ai] = A E[Ai Aj] = 0 E[AiEij] = 0 E[AiEi j] = 0 E[Eij] = = 0 E[Eij] = E E[EijEij] = 0 E[EijEij] = 0 These imply the following equivalent expressions for completing the model.

E[Pij] = P = E[Pij] + A + E E[PijPij] = + A E[PijPij] =

11 The third way of completing the model is to state: As are IDD1 (0, A), Es are IDD(0, E) and that the As and Es are not correlated.

Estimation of repeatability by regressionExpectation of sum of squares and products for estimations repeatability from regression and correlation: (2 records for each of n animals, i=1, , n and j=1 and 2).

Thus, where ^ indicates an estimate:

1

IDD: Idependent and Identically Distributed

12

Note

and

.

The estimate of repeatability by regression of second record on first record is: , where the expectation by parts is .

The variance of the estimate is:

.

Estimate of repeatability by correlation

where the expectation by parts is:

.

If some animals are not allowed to have a second records because of a low first records then the records is less than from records of these animals that were allowed to have second by a factor, k. i.e., , where k depends is also less than by the same from

on the intensity selection. Fortunately the factor, k, i.e.,

. The correlation estimate is biased since the .

records of only animals allowed second records is not

13 In the unusual situation when the choosing of pairs of first and second records depends on the size of the second records both regression and correlation estimates of heritability will be biased. Example of computing repeatability by regression and correlation. The following set of first and second records on 10 animals was draw from a population with = 500,A

= 20, and

E

= 10, i.e., Animal (i) 1 2 3 4 5 6 7 8 9 10

. Pi1 504 542 523 471 505 543 500 460 474 522 P.1 = 5.044 Pi2 533 548 522 484 495 543 495 479 449 527 P.2 = 5.075

14

To illustrate the variability in estimates from small samples the following table shows 20 estimates from samples of size 10 from the same population.

Estimation of repeatability from variance componentsThe ith animal has ni records, j = 1, , ni; and B animals, i =1, , B. The expectations of the usual sums of squares used in estimating A and E are:

The variance components to be estimated are replaced in the expectations by estimates of the variance components and then equated to the computed sums of squares. The estimates turn out to be:

, where where

, .

Thus

. The expected values of the parts give

. The approximate

variance of this estimate is:

15

Where

.

Example of estimating repeatability from variance components

The following set of 10 records on 5 animals was drawn was from a population with = 1000,A

= 10, and

E

= 10. Record (j) 2 976 1007 --1017 1022

Animal (i) 1 2 3 4 5

1 979 988 994 1004 1015

3 984 ---------

Pi. 2939 1995 994 2021 2037 P.. = 6386

ni 3 2 1 2 2 n. = 10

16

Exercise1 Milk production (kg) in Jersey cattle. Cow 1 2 3 4 5 1 17.78 18.65 16.66 16.99 19.33 2 21.55 24.79 21.01 21.08 25.77 Lactation (Milk, kg) 3 4 26.55 23.55 29.79 24.79 26.01 22.01 24.08 22.08 29.77 25.77 5 18.78 19.65 18.66 18.99 19.33 6 16.98 17.95 15.96 15.99 18.93

a) Computing the repeatability between the milk productions of the 1 and 2 lactations by regression and correlation methods! b) Computing the repeatability over all lactations using (co)variance components.

Table 1. Analysis of variance for milk production Source DF SS Model (between dam) Error (within dam) Corrected Total Dam - 1 DF(total)-DF(dam) Prod. - 1 SSdam

MS SS DF(dam)

E(MS) Ww+kWs Ww

SSerror SS DF(error) SStotal

17

Estimation of Heritability by Variance Components One-Way ClassificationThe one-way classification model is: , where is a constant; bi

is an effect in common to members of the ith genetic group; wij is a random effect associated with the jth member of the ith group; i = 1, , B; j = 1, , ni. The bs are IDD(0,2 b), 2

the ws are IDD(0,

w)

and bs and ws are mutually , can be shown to be

uncorrelated. The variance among groups,

the covariance between animals in the same group with aii, the additive relationship between pairs of the ith group, and dii, the corresponding dominance relationship. From genetic theory equals . Thus which by definition . Note

that dii = aii = 0 for i i, i.e., between groups that are not related to each other. Because and If , then, ; then . .

In terms of the Ps the expected values are:

model lead to the following expectations of the three sums of squares (also called quadratics) usually used to estimate :

. These definitions of the

.

18 The estimates are obtained by equating the quadratics to their expected values:

The estimate of the intra-class correlations is

and the estimate

of heritability is: h = (1/aii)t. The approximate variance of this estimate is: V(h) = V(t)/aii, where

Note that heritability is a simple multiple of the intra-class correlation coefficient, i.e., h=t/aii which is the reason that this method is often called the intra-class correlation method of estimating heritability.

The Intra-Class Correlation CoefficientThe term, intra-class correlation coefficient, is, on examining al the adjectives, simply the correlation between records of any of animals in the same class, in this case in the same genetic group, i.e.:

19 But . and . Thus

The genetic groups will often be groups with the same sire (paternal sib groups, aii = , dii = 0), with the same sire and dam (full sib groups, aii = , dii = ), or with the same dam (maternal sib groups, aii = , dii = 0). For this model to be correct, the groups cannot be related to each other, e.g., no sire can be the sire of more than one full sib group.

Two-Way Nested Classification Estimation of HeritabilityThis model is appropriate when sires are mated to many dams, but each dam is mated to only one sire with one or more progeny per dam: where

is a constant;

si is the effect common to all animals with the ith sire; dij is the additional effects common to all animals with the jth dam mated to the ith sire, i.e., si + dij = ijth full sib effect so that dij = (full sib effect)ij si; wijk is a random effect associated with the record of the kth member of the ijth full sib group. i = 1, , S; j = 1, , mij; k = 1, , nij; m. = D.

The total number of dams, and with the ss, IDD(0, s); the ds, IDD(0, d); the wa IDD(0, w) and the ss, ds, and ws mutually uncorrelated.

Note that:

20

is the covariance among paternal half sibs;

is the covariance among full sibs, then , minus the covariance among paternal half sibs, also includes , the variance of maternal effects) and , minus and . (if maternal effects exist,

is total phenotypic variance,

If

then

.

In terms of different combinations of the Ps the expected values are:

(records of full sibs)

(records of paternal half sibs)

(records of unrelated animals)

These definitions of the model lead to the expectations of the four quadratics used to estimate :

21

When the quadratic are equated to their expected values the estimates are:

Then .

If

, the estimate of heritability from the sire variance is:

i.e.,

.

If

can be assumed to equal zero then two other estimates that use dam

component of variance are:

i.e.,

.

22 If and other second and higher order components, Example of heritability by variance components for the two-way nested classification model: progeny with one record each, nested in dams, nested in sire. The following set of 50 records from 10 dams mated to 5 sires was drawn from a population with = 150, s = 50, d = 75 and w = 375.Sire Mate (i) Dam(j) 1 1 1 1 Progeny (k) 1 2 P11.= 1 2 3 4 5 6 7 8 P12.= P1..= 2 2 1 1 1 2 P21.= 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 3 4 5 6 7 8 P22.= P2..= Pijk 137 166 303 142 103 125 153 180 170 157 153 1183 1486 110 112 222 190 199 191 171 156 158 171 184 1420 1642 Sire Mate (i) Dam(j) 3 1 3 1 Progeny (k) 1 2 P31.= 1 2 3 4 5 6 7 8 P32.= P3..= 4 4 1 1 1 2 P31.= 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 1 2 3 4 5 6 7 8 P42.= P4..= Pijk 169 163 332 126 173 176 154 169 179 178 191 1346 1678 144 144 288 170 147 147 161 159 125 128 128 1165 1453 Sire Mate (i) Dam(j) 5 1 5 1 Progeny (k) 1 2 P51.= 1 2 3 4 5 6 7 8 P52.= P5..= Pijk 138 150 288 136 142 128 145 168 149 152 137 1157 1445

etc., are zero then:

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

2 2 2 2 2 2 2 2

5 5 5 5 5 5 5 5

2 2 2 2 2 2 2 2

P=

7704

23

Note n.. = 50, D = 10, S = 5.

24 This estimate illustrates the points that estimates of h by this method may be outside of the theoretical limits of 0 and 1. In this case, and

, which is also obvious unreasonable because that is greater than the total phenotypic variance.

Two-Way Nested Classification Joint Estimation of Heritability and repeatabilityThe model applies when a sire has many progeny each with one or more records and dams are assumed to be unrelated: where is a constant; si is the effect common to all animals with the ith sire;cij

is the additional effects common to all records of the jth progeny sired by the ith sire, i.e., si + cij corresponds to animal effects of repeatability model;

wijk is what is left over, a random effect associated with the kth record of the ijth progeny of the ith sire. i = 1, , S; j = 1, , mij; k = 1, , nij; m. = C.

Note that this is the same statistical model as the previous one except that cij appears rather than dij and C rather than D. The assumptions about the elements of the model are the same except that now c is interpreted differently from d.

Again,

25

But now,

with environmental effects. Thu,

where

is the variance of permanent

Now,

is the variance of temporary environmental effects where in the previous some genetics variance.

heritability model,

If

, then

.

The expectations of different combinations of the Ps are:

(records of the same animal)

(records of paternal half sibs)

(records of unrelated animals)

The same four quadratics as before are used to estimate expectations lead to the estimates:

and

. Their

26

The estimate of phenotypic variance is:

The estimate of heritability is:

The estimate of repeatability is:

The within sire estimate of repeatability is:

Because the computing procedure is the same as for the previous example no example is given. Instead, as an exercise in simulating models and in computing estimates for the two-way nested model for joint estimation of h and r the following is given.

Two-Way Cross Classification ModelThe mating pattern for this model is that sires are mated to many dams, dams are mated to many sires, and each progeny has one record.

27 where is a constant; si is the effect common to all animals with the ith sire; dj is the effects common to all animals with the jth dam; (sd)ij is the effect (difference from si + dj) common to all animals in the ijth full sib group, so that (sd)ij = (full sib effect)ij si dj; and, wijk is a random effect associated with the kth member of the ijth full sib group. i = 1, , S; j = 1, , D; k = 1, , nij; C = number of matting or nijs > 0.

The ss are IDD(0, s); the ds, IDD(0, d); the (sd)s are IDD(0, (sd)); the ws IDD(0, w) and the ss, ds, (sd)s and ws are mutually uncorrelated.

, is the covariance among paternal half sibs;

, is the covariance among maternal half sibs;

, is the difference between the covariance among full sibs and the covariance among paternal sibs and among maternal sib, ( is the covariance among full sibs); is the total phenotypic variance both the two-way cross classification model. and of

If

, then,

and

. In

terms of different combinations of the Ps the expected values are:

28

(records of full sibs)

(records of paternal half sibs)

(records of maternal half sibs)

(records of unrelated animals)

These definitions of the model lead to the following expectations of the five quadratics used to estimate and :

With,

29

When the quadratic are to their expected values the estimates are:

Let:

Then:

30 Also:

If

then there are three estimates of

The three corresponding estimates of heritability in the narrow sense are:

The estimate of

is

31

Genetic, Environmental and Phenotypic CorrelationsIf X and Y denote the two traits then the genetic, environmental and phenotypic correlations are defined as:

Genetic:

Environmental:

and Phenotypic:

Where and

and

are genetic, environmental and phenotypic variance and

are the corresponding genetic, environmental and phenotypic covariance

between traits X and Y. Usually only additive genetic effects are assumed to contribute to the genetic variance and covariance.

Estimation from components of (co)variances One-Way Classificationsimilar) (extension to more complex classification models is

The model for two traits, one-way classification model is analogous to the single traits models used for estimating heritability with sires providing the group classification:

32 Trait x:

Trait y:

, both measured on animal ij, where:

is a constant for the appropriate trait; bi is an effect common to ith genetic group for the appropriate trait; wij is an effect associate with jth member of ith group aii is the additive relationship among animals in group I with i = 1, , B; j = 1, , ni.

If all genetic effects are additive genetic effect then:

The bs are

, the ws are

, the bx are uncorrelated ; and wxij

with wss the bys are uncorrelated, bxi and byi have covariance and wyij (on the same animal) have covariance .

In terms of the bs these definitions are:

33

These definitions lead to the following expectations of the six sums of squared and three sums of products used for estimating the (co)variance.

34

The expectations of the quadratics are equated to their computed values to estimate the (co)variance. Estimates of the variance components are as before:

where u = x or y.

Similarly the estimates if the covariance components are:

Then,

35

Estimates of variance components for the one-way classification model provide, as before, estimates of heritability:

The expectations by parts of these estimates are rg, rp, re, hx and hy. The estimates of heritability can be outside the parameter limits of 0 and 1. The estimate of genetic correlation may be, and with small data set often is, outside the parameter limits of -1 and +1.

36

Practice and Solution1. The pounds of grease fleece weight were measured in a sample from a sheep population. The data listed below is for the average of both parents (X, midparent) and their offspring (Y).

X Y

11.8 7.7

8.4 5.7

9.5 5.8

10.0 7.2

10.9 7.3

7.6 5.4

10.8 7.2

8.5 5.6

11.8 8.4

10.5 7.0

(a) Calculate the regression coefficient of offspring on midparent and estimate the heritability of grease fleece weight in this population. X 11.80 8.30 9.50 10.00 10.90 7.60 10.80 8.50 11.80 10.50 Y 7.71 5.70 5.80 7.20 7.30 5.40 7.20 5.60 8.40 7.00 X 139.24 68.89 90.25 100.00 118.81 57.76 116.64 72.25 139.24 110.25 Y 59.44 32.49 33.64 51.84 53.29 29.16 51.84 31.36 70.56 49.00 XY 90.98 47.31 55.10 72.00 79.57 41.04 77.76 47.60 99.12 73.50

The regression of offspring on midparent is an estimate of heritability: h = 0.67

37 (b) Plot the data and draw the regression line.

9.0 8.5 8.0 7.5 Y 7.0 6.5 6.0 5.5 5.0 7.0 8.0 9.0 X The point plotting establish the regression line with slope b = 0.6675. For every 1 lb increase in midparent values, offspring tend to produce 0,67 lb. Thus, h = 0.67. 10.0 11.0 12.0 y = 0.6675x + 0.0758 R = 0.9005

(c) Calculate the correlation coefficient and from that estimate the heritability. The correlation coefficient (r) is calculated using formula:

38 Or the heritability can be estimated by square root of determination coeficient: R =

0.9005 then

2. Evaluation of the heritability and repeatability by the paternal half sibs, as well as genetic, environmental and phenotypic correlation.

Analysis of variance between paternal half sibs for the birth weight in Nellore cattle and calving interval in Guzera cattle (Campos, 1999) Source Between sires Progeny within dams Between sires Progeny within dams Heritability (BW) DF Mean Square Birth weight (BW) in Nellore 41 35.90 1206 15.66 Calving interval (CI) in Guzera 21 18132.16 287 9888.22 E(MS) W w+29.45Ws W w Ww+14.1Ws W w

w = 15.66 T = w + 29.45 s 35.90 = 15.66 + 29.45 s s = (35.90 15.66)/29.45 = 0.68 (1/4 s) Heritability (CI)

w = 9888.22 T = w + 14.1 s 18132.16 = 9888.22 + 14.1 s s = (18132.16 9888.22)/14.1 = 584.68 (1/4 s)

39

Anlysis of variance of milk production in Jarsey cattle (Campos, 1999) Source Between dams Within dams w = 316447.7 T = w + 4.02 s 1127422.4 = 316447.7 + 4.02 s s = (1127422.4 316447.7)/4.02 = 201735.0 Analysis of variance and covariance of adjusted weight at 180 (W180) and 270 (W270) days old in Nellore cattle (Campos, 1999) Source Sire Error DF 24 696 Mean Square W180 W270 518.63 1014.4 261.69 478.4 E(MS) Ww+28.16Ws Ww Mean E(MP) Products 483.97 W w+28.16Ws 261.94 Ww DF 404 1199 Mean Square 1127422.4 316447.7 E(MS) Ww+4.02Wd W w

Heritability (W180) w = 261.69 T = w + 28.16 s 518.63 = 261.69 + 28.16 s s = (518.63 261.69)/ 28.16 = 9.12 (1/4 s)

Heritability (W270) w = 478.40 T = w + 28.16 s 1014.4 = 478.4 + 28.16 s s = (1014.4 478.4)/ 28.16 = 19.03 (1/4 s)

40 Genetic correlation (rA) between the W180 and W270 WT = Ww + 28.16 Ws 483.97 = 261.94 + 28.16 Ws Ws = 7.88

Phenotypic correlation (rP) between the W180 and W270

Envoronmental correlation (rE) between the W180 and W270

41

Exercise1. Calculate the heritability, repeatability and genetic correlation by regression, correlation and (co)variance components.

(a) Birth (BW) and weaning weight (WW) of Nellore cattle.

Animal 41 42 43 44 45 46 47 48 49 51 52 53

Sire 1 2 3 1 2 3 1 2 3 1 2 3

Dam 11 11 11 11 11 11 12 12 12 12 12 12

Trait (kg) BW 32.5 32.1 31.9 30.9 31.5 32.8 29.6 29.9 29.8 30.4 30.8 30.2 WW 172.5 172.1 171.9 170.9 171.5 172.8 169.6 169.9 169.8 170.4 170.8 170.2

(b) Milk production of dairy cattle.

Cow 1 2 3 4 5 6 7 8 9 10

1 18.78 19.65 17.66 15.99 20.33 14.91 19.77 18.32 18.27 18.99

2 22.55 25.79 22.01 20.08 26.77 20.01 25.88 24.77 24.67 25.44

Lactation (Milk, kg) 3 4 26.55 23.55 29.79 24.79 26.01 22.01 24.08 22.08 29.77 25.77 24.01 22.01 28.88 23.88 28.77 23.77 28.67 23.67 29.44 25.44

5 17.78 18.65 16.66 16.99 19.33 14.91 18.77 17.32 17.27 17.99

6 16.78 17.65 15.66 15.99 18.33 13.91 17.77 16.32 16.27 16.99

42

SAS Programs1. You can to use the following procedures in SAS for computing de correlation and regression estimates:data repeatability; input animal p1 p2; cards; 1 504 533 2 542 548 3 523 522 4 471 484 5 505 495 6 543 543 7 500 495 8 460 479 9 474 449 10 522 527 ; proc corr data=repeatability cov noprint out=covar; var p1 p2; data estimates; set if _type_ eq 'cov' if _type_ eq 'corr' if rb eq '.' and rc keep rb rc; covar; and _name_ eq 'p1' then rb = p2/p1; and _name_ eq 'p1' then rc = p2; eq '.' then delete;

data param; set estimates; if rb ne '.' then param="reg"; else param="cor"; if rb ne '.' then estimate=rb; else estimate=rc; keep param estimate; proc print data=param noobs; var param estimate; run;

43 2. Example SAS program for estimating repeatability using variance components.DATA REPEATABILITY2; INPUT ANIMAL RECORD @@; CARDS; 1 979 1 976 1 2 988 2 1007 2 3 994 3 . 3 4 1004 4 1017 4 5 1015 5 1022 5 ;

984 . . . .

PROC VARCOMP DATA=REPEATABILITY2 method=type1 maxiter=5000 epsilon=10E12; CLASS ANIMAL; MODEL RECORD = ANIMAL; RUN; QUIT;

44 3. Example SAS program for estimating genetic correlation using regression and correlation approach.data ex_1; input animal p1 p2; cards; 1 501 509 2 517 499 3 512 517 4 468 474 5 496 499 6 499 494 7 530 528 8 528 532 9 517 516 10 506 506 ; proc corr data=ex_1 cov noprint out=covar; var p1 p2; data estimates; set if _type_ eq 'cov' if _type_ eq 'corr' if rb eq '.' and rc keep rb rc; covar; and _name_ eq 'p1' then rb = p2/p1; and _name_ eq 'p1' then rc = p2; eq '.' then delete;

data param; set estimates; if rb ne '.' then param="reg"; else param="cor"; if rb ne '.' then estimate=rb; else estimate=rc; keep param estimate; proc print data=param noobs; var param estimate; run;

45 4. Example SAS program for estimating heritability.data heritability; input animal sire dam weight; cards; 1 1 1 270 2 1 2 285 3 1 3 289 4 1 4 279 5 2 1 282 6 2 2 275 7 2 3 280 8 2 4 258 9 3 1 280 10 3 2 278 11 3 3 300 12 3 4 296 ; proc glm data=heritability; class sire; model weight = sire; run;quit;

Analysis of Variance Source Model (between sires) Error (between animals) Corrected Total y Computing of W w: Ww = MSE = 103.61 y Computing of Ws: Ws = MSE MSM = 217.75 103.61 = 28.54 k 4 WT = Ww + Ws = 103.61 + 28.54 = 132.15 y Computing of heritability: h = 4 x Ws = 4 x 28.54 = 0.86 WT 132.15 DF 2 9 11 SS 435,50 932,50 1.368,00 MS 217,75 103,61 E(MS) Ww+kWs Ww

46 5. Using mixed model (mixed procedure) with random effect of sire for estimating the heritability.data heritability; input animal sire weight; cards; 1 1 270 2 1 285 3 1 289 4 1 279 5 2 282 6 2 275 7 2 280 8 2 258 9 3 280 10 3 278 11 3 300 12 3 296 ; proc mixed data=heritability asycov; class sire; model weight = ; random sire; ods listing exclude AsyCov CovParm; ods output asycov=covmat covparms=estmat; proc iml; start seh(V, C, LG, LP, H); Vp = LP`*V; Vg = LG`*V; H = 4*Vg/(Vp+vg); finish seh; use estmat; read all into v; use covmat; read all into c; * Note that SAS introduces an extra first column into the matrix which must be removed; C = C(|1:nrow(C), 2:ncol(C)|); * Order of variance components in v and c matrices is V(R), V(G), V(error); LG = {1, 0}; LP = {0, 1}; call seh(V, C, LG, LP, H); print V, C, H; run; quit;

47 6. Example SAS program for estimating repeatability.data repeatability; input dam $ lactation production @@; cards; A 1 2703 A 2 2000 A 3 1945 A 4 1919 B 1 2310 B 2 2425 B 3 2110 B 4 2558 C 1 2200 C 2 2985 C 3 2693 C 4 2200 D 1 2000 D 2 1900 D 3 2545 D 4 2510 ; proc glm data=repeatability; class dam; model production = dam; run; quit;

A B C D

5 5 5 5

1800 2100 2531 2300

Analysis of Variance Source Model (between dam) Error (between productions) Corrected Total y Computing of W w: DF 3 16 19 SS 511183.00 1468423.20 1979606.20 MS 170394.33 91776.45 E(MS) Ww+kWs Ww

Ww = MSE = 91776.45 y Computing of Ws:

Ws = MS between dam MS between progeny within dam = K Ws = 170394.33 91776.45 = 15723.57 5 y r= Computing of repeatability: Ws Ws+Ww = 15723.57 = 0.15 15723.57+91776.45

48 7. Estimating repeatability by mixed procedure.data repeatability; input dam $ lactation production @@; cards; A 1 2703 A 2 2000 A 3 1945 A 4 1919 B 1 2310 B 2 2425 B 3 2110 B 4 2558 C 1 2200 C 2 2985 C 3 2693 C 4 2200 D 1 2000 D 2 1900 D 3 2545 D 4 2510 ;

A B C D

5 5 5 5

1800 2100 2531 2300

proc mixed data=repeatability asycov; class dam; model production = ; random dam; ods listing exclude AsyCov CovParm; ods output asycov=covmat covparms=estmat; proc iml; start seh(V, C, LG, LP, H); Vp = LP`*V; Vg = LG`*V; H = Vg/(Vp+vg); finish seh; use estmat; read all into v; use covmat; read all into c; * Note that SAS introduces an extra first column into the matrix which must be removed; C = C(|1:nrow(C), 2:ncol(C)|); * Order of variance components in v and c matrices is V(R), V(G), V(error); LG = {1, 0}; LP = {0, 1}; call seh(V, C, LG, LP, H); print V, C, H; run; quit;

49 8. Example SAS program for estimating heritability and its standard errordata one; input animal sire dam yield @@; cards; 11 1 1 29.9 21 1 2 32.9 31 1 3 30.9 12 1 1 28.1 22 1 2 32.5 32 1 3 31.1 13 1 1 33.4 23 1 2 32.7 33 1 3 31.4 14 2 1 29.1 24 2 2 28.1 34 2 3 28.1 15 2 1 28.5 25 2 2 28.5 35 2 3 27.5 16 2 1 28.9 26 2 2 27.9 36 2 3 27.9 17 3 1 29.8 27 3 2 30.8 37 3 3 28.8 18 3 1 30.1 28 3 2 31.1 38 3 3 30.1 19 3 1 31.2 29 3 2 31.2 39 3 3 30.2 ;

41 42 43 44 45 46 47 48 49

1 1 1 2 2 2 3 3 3

4 4 4 4 4 4 4 4 4

30.9 30.5 30.7 27.1 29.5 26.9 30.8 30.1 30.2

51 52 53 54 55 56 57 58 59

1 1 1 2 2 2 3 3 3

5 5 5 5 5 5 5 5 5

33.9 33.1 33.4 29.1 29.5 29.9 34.8 35.1 35.2

61 62 63 64 65 66 67 68 69

1 1 1 2 2 2 3 3 3

6 6 6 6 6 6 6 6 6

30.9 30.5 30.7 31.1 31.5 31.9 30.8 30.1 30.2

proc mixed data=one asycov; class sire dam; model yield = ; random sire dam; ods listing exclude AsyCov CovParm; ods output asycov=covmat covparms=estmat; proc iml; start seh(V, C, LG, LP, H, SE); Vp = LP`*V; Vg = LG`*V; H = VG/Vp; d = (1/Vp)*(LG - (LP*H)); VH = d`*C*d; SE = sqrt(VH); finish seh; use estmat; read all into v; use covmat; read all into c; * Note that SAS introduces an extra first column into the matrix which must be removed; C = C(|1:nrow(C), 2:ncol(C)|); * Order of variance components in v and c matrices is V(R), V(G), V(error); LG = {0, 1, 0}; LP = {0, 1, 1}; call seh(V, C, LG, LP, H, SE); print "Heritability", V, C, H, SE; run; quit;

50 9. Example SAS program for estimating heritability and genetic correlation between the traits.data param; input sire $ pn pd @@; pnd=pn+pd; cards; A 53 210 B 49 210 A 45 185 B 45 204 A 54 220 B 47 184 A 46 217 B 46 190 A 53 230 B 48 245 A 44 160 B 56 200 A 54 177 B 60 180 A 50 215 B 55 195 A 60 220 B 54 170 A 51 145 B 50 175 ;

C C C C C C C C C C

66 31 53 48 50 47 63 46 55 50

189 156 200 220 168 167 242 171 178 158

ods trace on/listing; ods output Estimates1=var1 Estimates2=var2 Estimates3=var3; proc varcomp data=param method=type1 maxiter=5000 epsilon=10E-12; class sire; model pn pd pnd = sire; run; quit; ods trace off; data estimates; set var1 var2 var3; proc transpose data=estimates out=estimates1 (drop=_name_); data estimates2; set estimates1; if col1