BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

11
NON-LINEAR PREDICTION OF LATENT GENETIC LIABILITY WITH BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. Prediccion no-lineal de valor genetico subyacente con expresio'n binaria : un procediniento Bayesiano empirico. D. GIANOLA *,**, J.L. FOULLEY *. FRANCE SY-2-30 I - INTRODUCTION The implementation of field recording programs aimed to monitoring and evaluating genetic merit of domestic animals has provided an excellent framework for research in methodology of sire evaluation. For "continuous" traits such as milk yield and growth performance, mixed model prediction (Henderson, 1973, 1975) is the method of evaluation of choice in view of its optimality properties. This may not be the case with discrete traits - e.g., calving difficulty or twinning propensity - as many of the assumptions invol - ved in linear model techniques are violated (Thompson, 1979 ; Gianola, 1980a, 1980b). Research on genetic evaluation with categorical data is recent and the methods so far proposed have drawbacks. For example, the method of Quaas and Van Vleck (1980) can be applied strictly to random models with one factor, does not take advantage of a possible ordering of response categories, and it may give estimates ouside of the parameter space (Van Vleck and Karner, 1980). Further, to implement this procedure, estimates of multivariate variance compo- nents are needed. Gianola (1980a,b) described a method based on logistic trans- forms which is conditional on the covariance matrix of the random effects sampled. However, optimality properties are obtained only asymptotically, there is an element of arbitrariness when some of the counts are zero, and multiva- riate variance component estimates are also required when the number of response categories is larger than 2. None of the proposed methods adress simultaneous estimation of variance (covariance) components, means and.random variables. A method of satisfactory analysis of categorical data in animal breeding would need to conform to a given underlying genetic model consistent with the theory of quantitative genetics (e.g., Dempster and Lerpter, 1950). Further, it should be flexible and general enough to accommodate situations of potential interest such as : a) BINARY RESPONSES (e.g., survival in cattle); 2) MULTIPLE BINARY RESPONSES (e.g., survival of Fetus anc* twinning) ; 3) MIXTURES OF C0NTIN0US AND BINARY RESPONSES (e .g ., growth and double m uscling) ; 4 ) ORDERED P0LYCH0T0M0US RESPONSES ( e . g . , c a l v i n g d i f f i c u l t y ) ; and 5 ) MIXTURES OF C0NTIN0US AND ORDERED P0LYCH0T0M0US RESPONSES ( e . g . , b i r t h w e ig h t and c a l v i n g d iffic u lty ). Cases with UNORDERED polychotomous responses may be of interest in animal ethology in studies of choice among alternative activities (Bock and Jones, 1968) or in some dairy cattle type scoring systems. This communication describes a method which we have subsequently extended to all situations described above. To describe the procedure, we consider the simplest problem, i.e ., one binary response variate, and present * Station de Genetique Quantitative et Appliquee, Centre National de Recherches Zootechniques, INRA, 78350 Jouy-en-Josas (France). ** On leave from the Department of Animal Science, University of Illin o is , Urbana, Illin o is 61801, U.S.A. 293

Transcript of BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

Page 1: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

NON-LINEAR PREDICTION OF LATENT GENETIC LIABILITY WITH BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH.

Prediccion no-lineal de valor genetico subyacente con expresio'n binaria : un procediniento Bayesiano empirico.

D. GIANOLA *,**, J.L. FOULLEY *.FRANCE

SY-2-30

I - INTRODUCTION

The im p l e m e n t a t i o n o f f i e l d r e c o r d i n g p rog ra m s aimed t o m o n i t o r i n g and e v a l u a t i n g g e n e t i c m e r i t o f d o m e s t ic a n im a ls has p r o v i d e d an e x c e l l e n t f ra m ew ork f o r r e s e a r c h i n m e th o d o lo g y o f s i r e e v a l u a t i o n . F o r " c o n t i n u o u s " t r a i t s such as m i l k y i e l d and g ro w th p e r f o r m a n c e , m ixed model p r e d i c t i o n (H e n d e rs o n , 1973 , 1975) i s t h e method o f e v a l u a t i o n o f c h o ic e i n v ie w o f i t s o p t i m a l i t y p r o p e r t i e s . T h i s may n o t be th e case w i t h d i s c r e t e t r a i t s - e . g . , c a l v i n g d i f f i c u l t y o r t w i n n i n g p r o p e n s i t y - as many o f t h e a s s u m p t io n s i n v o l ­ved i n l i n e a r model t e c h n iq u e s a re v i o l a t e d (Thompson, 1979 ; G i a n o l a , 1980a , 1 9 8 0 b ) .

R esea rch on g e n e t i c e v a l u a t i o n w i t h c a t e g o r i c a l d a ta i s r e c e n t and th e methods so f a r p ro p o s e d have d ra w b a cks . Fo r e x a m p le , t h e method o f Quaas and Van V le c k (1 9 8 0 ) can be a p p l i e d s t r i c t l y t o random models w i t h one f a c t o r , does n o t t a k e a d va n ta g e o f a p o s s ib l e o r d e r i n g o f re sponse c a t e g o r i e s , and i t may g i v e e s t i m a t e s o u s id e o f t h e p a ra m e te r space (Van V le c k and K a r n e r , 1 9 8 0 ) . F u r t h e r , t o im p le m e n t t h i s p r o c e d u r e , e s t im a t e s o f m u l t i v a r i a t e v a r i a n c e compo­n e n ts a re ne eded . G ia n o la ( 1 9 8 0 a ,b ) d e s c r i b e d a method ba sed on l o g i s t i c t r a n s ­fo rm s w h ich i s c o n d i t i o n a l on th e c o v a r ia n c e m a t r i x o f t h e random e f f e c t s sa m p le d . H ow eve r , o p t i m a l i t y p r o p e r t i e s a re o b ta in e d o n l y a s y m p t o t i c a l l y , t h e r e i s an e le m e n t o f a r b i t r a r i n e s s when some o f th e c o u n ts a r e z e r o , and m u l t i v a ­r i a t e v a r i a n c e component e s t i m a t e s a re a l s o r e q u i r e d when t h e number o f re sponse c a t e g o r i e s i s l a r g e r th a n 2 . None o f t h e p ro p o se d methods a d re s s s im u l t a n e o u s e s t i m a t i o n o f v a r i a n c e ( c o v a r i a n c e ) com ponen ts , means an d .ra ndo m v a r i a b l e s .

A method o f s a t i s f a c t o r y a n a l y s i s o f c a t e g o r i c a l d a ta i n a n im a l b r e e d in g w o u ld need t o c o n fo rm t o a g i v e n u n d e r l y i n g g e n e t i c model c o n s i s t e n t w i t h th e t h e o r y o f q u a n t i t a t i v e g e n e t i c s ( e . g . , Dempster and L e rp te r , 1 9 5 0 ) . F u r t h e r , i t s h o u l d be f l e x i b l e and g e n e r a l enough t o accommodate s i t u a t i o n s o f p o t e n t i a l i n t e r e s t such as : a) BINARY RESPONSES ( e . g . , s u r v i v a l i n c a t t l e ) ; 2 ) MULTIPLE BINARY RESPONSES ( e . g . , s u r v i v a l o f Fe tus anc* t w i n n i n g ) ; 3 )MIXTURES OF C0NTIN0US AND BINARY RESPONSES ( e . g . , g ro w th and d o u b le m u s c l i n g ) ; 4 ) ORDERED P0LYCH0T0M0US RESPONSES ( e . g . , c a l v i n g d i f f i c u l t y ) ; and 5 ) MIXTURES OF C0NTIN0US AND ORDERED P0LYCH0T0M0US RESPONSES ( e . g . , b i r t h w e ig h t and c a l v i n g d i f f i c u l t y ) . Cases w i t h UNORDERED p o lych o to m o u s response s may be o f i n t e r e s t i n a n im a l e t h o l o g y i n s t u d i e s o f c h o ic e among a l t e r n a t i v e a c t i v i t i e s (Bock and J o n e s , 1968) o r i n some d a i r y c a t t l e t y p e s c o r i n g sy s te m s .

T h is c o m m u n ic a t io n d e s c r i b e s a method w h ich we have s u b s e q u e n t l y e x te n d e d t o a l l s i t u a t i o n s d e s c r i b e d above. To d e s c r i b e th e p r o c e d u r e , we c o n s id e r th e s i m p l e s t p r o b le m , i . e . , one b i n a r y re sponse v a r i a t e , and p r e s e n t

* S t a t i o n de G e n e t iq u e Q u a n t i t a t i v e e t A p p l i q u e e , C en t re N a t i o n a l de R eche rches Z o o te c h n iq u e s , INRA, 78350 J o u y -e n - J o s a s ( F r a n c e ) .

* * On le a v e f ro m t h e D e p a r tm e n t o f A n im a l S c ie n c e , U n i v e r s i t y o f I l l i n o i s , U rb a n a , I l l i n o i s 6 1 8 0 1 , U .S .A .

293

Page 2: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

a simple numerical example in the context of calf mortality. As the derivations are lengthy and somewhat technical, only the main results are given here for reasons of space. Details and peculiarities pertaining to different settings will be reported elsewhere.

II - METHODOLOGY

Layout - Consider, an sx2 contingency table. In the i*^1 row (i=1,...,s) we score a variate into one of two mutually exclusive and exhaustive categories ; there are n ^ RESPONSES and n ^ - n ^ NON-RESPONSES with ni#>=1 fixed. Note that n -j can be null.

The probability of response in the ith row is logi sti c or

P..J = (1 + e- l J i ) " 1 ( 1)

which gives the convenient results 6 P-r- 1 1 = P.. (1 - P.„)6 p . 1 1 1 1

and p. = In [P.^d - P.,)}

(2 a)

(2b)

The connection with the threshold model of quantitative genetics is obtained by assuming an underlying variable y'vN(6^,1) such that if y<t, where t is a fixed threshold, then a RESPONSE OCCURS. We can write

Pi 1 4>(u) du = $(t-6 .) (3)—oo

where <J>(.) and $(.) are the standard normal density and distribution functions,respectively ; t-6-j is the distance in units of standard deviation between thethreshold and the mean of y in the i**1 row. Next, let p.=(t-6 .) II//3, andl iw n te

P.„ = 4>(p. -^)- (1+e" y i ) " 1 (A)1 1 i n =

an approximation justified by Gumbel (1961), Ashton (1972) and Bock (1975).For values of t-t between -5.0 and 5.0, the difference between (A) and (3) does not exceed .022 (Johnson and Kotz, 1970).

The logits p. can be described as

p. = x1. 6 + z’. ui l i% % <v <v

where x'-j and £'-j are known row vectors of orders p and q, respectively. For the sx2 table, we write

p = X 6+ Zu (6 )% f\j % wC

294

Page 3: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

where £ is an sx1 vector, X.' = Jx-j,..., 3], 2/ = [j ,. ..,zjand, without loss of generality, 8 is uniquely defined, i?e., $ has full-column rank.

The data in the sx2 table can be represented by the vectors Yi,..., Ys, where ^ = [Vji,..., Yjn . with Yir=1 if a response occurs, and YiV=0 otherwise. Given the likelihood function is :

f(Y.,...,Y |p ) = f(Y|0)<r.n, P?j1 (1-P.,)0i-“ni1 (7)'VI 'vs'/v <vl<v 1=1 i1 i1

and =( r ' nf-')Bayesian setting - Inferences are made from the a-posteriori density

■«$> (8 )

where f(0) is the a-priori density. This approach is not new in animal breeding as BLUP can be derived in a Bayesian scenario (Rdnningen, 1971 ; Dempfle, 1977). The Bayesian estimator of Jg is the vector 0 that minimizes the a-posteriori expected risk

OO 00

R(0 ; Y) = f ...f 1(0,0) f(01Y) d(0) (9)* \ j J * \ j ' f \ j 'V ico oo

where 1(0,0) is a loss function. If the loss function is

1(0,0) = (050)' (0T0) = (0.-0.)''Vi 'Vi 'Vi'Vi 'Vi'Vi 1=1 1 1

the Bayes estimator is the posterior mean

( 10)

0= E (0 Y)*v 'V <v. ( 11)

or best predictor in the sense of Cochran (1951), and Henderson (1973).The prior for 0 is a (p+q) - variate normal distribution

V a✓P 0

= *v N ( * / 0u 0 0 G a'V/ 'V/ 'V 'Vz u ✓

where a u is a scalar taken as known for the purpose of this communication, and

?, in genetic applications, is a matrix of additive relationships (Wright, 921^; Malecot, 1969). Notethat t=a”2 is the ratio of the residual variance to ajj. Hence, if u is a vector of additive genetic values, t=(1-h^)/h^ ; if ^ is a vector of "sire effects", t=(4-h2)/h . In general, h^ may be known from previous data.

Estimation. As calculating E((3|y ) is complex, we approximate the Bayes estimator by the mode of the posterior density (Lindley and Smith, 1972). Leonard (1972) points out that a normal approximation to the posterior is fairly accurate provided that none of the n-j 1 or n-j.-n-ji are small.

295

Page 4: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

Maximizing the log of (8) with respect to 0 requires iterative solution. The Newton-Raphson algorithm (DShlquist and Bjfirck, 1974) can be written as

32 Lie)(Q. 0 . .) ='tl 'Vi — 1

3 L(0 )%( 12)

s a.vectorwhere L(g> is the log of the posterior density, and g.-jj). iof corrections at the i*" round of iteration. Starting'wirh a gue iteration stops when a certain rule is satisfied. For example, one may to stop if (p+q)x10-8. After some algebra, and taking r_1 = £, i.e

• ufi-l]

W O " 1)

X

Z''b 'V/

%

wfi-1)

X*XJ %

z'V/ 'V l * 'S'1

\

V w-V B

y ( 0V u

» i \

■v m .

X'<v

Z ' v tG v.i-1

(13)

where

W^l= Diag {n. p W ( 1-p |A} ; j=1,...,s ;•v ) • j 1 J i

p(!) >........ (nsr ns. psCft>j' •and

u b ) = JH + yto% % <CU‘

As in most animal breeding applications p+q is very large, in general (13) cannot be solved by direct inversion at every round, and a more involved numerical strategy is needed. This is not presented here for reasons of space.

The paralellism between (13) and the mixed model equations is clear. In particular, letting

q M = W H + m H z u l - 1)f \ j <\j 'Xj 'Xj ^ .

« ( < - ’ )= w P - 1 ] x'Xj % \ j

on can write

(Z’ W Z + G-1t) u. = Z' (q - Q )'Xj 'Xj f \ j 'Xj r y , ! r \ j r \ j f \ , P

which closely resembles a BLUP e v a lu a t io n i n l i n e a r models.

296

Page 5: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

Posterior inference - If the posterior is normal, the mean and the mode of the posterior distribution are identical, and

32 L (0) _______^ - *. = I ( 0 )<v< (14)

gives the posterior information matrix (Leonard, 1972). Hence, the inverse of (14) can be taken as an approximation to the posterior dispersion. Therefore

* ^: k'B + m'u|Y, a2 = k' 8* + m'u*

*\j r\, *\t ryj r\j U| A/ /v *\/ A,

- [ t v i i ’ YVar + m'u| Y, o'

where k and m are known vectors, andA/ 'V

k 1 (8-8*) + m ’(u-u*)/ {fk'm'l T1(0*)A/ A/ A/ f\, i\j f\j iry, f\j * + r\j } 1 / 2 -V N(0 ,1 )

(15)

(16)

(17)

This permits making probability statements about linear combinations of 0.

Estimated cell.iresponse probabilities can be obtained from (4) ; approximate confidence intervals on true response probabilities can be cal­culated from large sample theory.

Fit of the model- When cell counts are large, the statistic

, - (n.,-n. P . . ) 2y 2 . i .0 ;.1.

- = 1

can be referred to an s-p chi-square distribution, with p=rank (£).

(18)

Unknown variance - The procedure can be extended to estimate au by taking an a-priori distribution for this variance, e.g., an inverted chi-square density (Leonard, 1972 ;^Lindley and Smith, 1972). This introduces additional itera­tions in which j=) is evaluated at the modal estimate of o‘. and vice versa. Alternatively, with 1 =$ , (13) yields a system associated with an equivalentlinear model from wh?ch a2 can be estimated by a number of methods.u_

III - EXAMPLE

Schaeffer and Wilton (1976) present a hypothetical data set for calving case and calf mortality which is adopted to illustrate the procedures. We arranged the data for calf mortality into a 20x2 contingency table as shown in tabLe 1. There is a total of 28 records, out of which 20 (71%) occur in the ALIVE category. Calf mortality in heifers is 2/13 (15.4%), and in cows 6/15 (40%9, a somewhat artificial situation.

The model for y., the variate associated with the i*^ row of the

297

Page 6: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

TABLE 1

DISTRIBUTION OF CALF MORTALITY AT BIRTH BY HERD-YEAR, AGE OF DAM, SEX OF CALF AND SIRE OF CALF SUBCLASSES

(Distribution de la mortandad de terneros al nacimiento de acuerdo a sub-clases de rodeo-ano, edad de la madre, sexo y padre

del ternero).

HERD-YEAR AGE OF DAM SEX OF CALF(rodeo- (Edad de la (sexo delano) madre) ternero)

1 2 M1 2 F1 3 M1 2 F1 3 M1 3 F1 2 M1 3 M1 3 F2 2 M2 2 F2 3 M2 2 F2 3 M2 2 F2 3 M2 2 M2 2 F2 3 M2 3 F

SIRE OF CALF CALF STATUS (status del ternero)ALIVE DEAD(vi vo) (muerto)

1 1 01 1 01 0 12 1 02 2 02 2 13 1 13 1 03 0 11 1 01 2 01 0 12 1 12 1 03 1 03 1 04 1 04 1 04 0 24 2 0

TABLE 2SOLUTIONS BY ROUND OF ITERATION FOR EACH OF TWO STARTING SETS (Soluciones por ciclo de iteration para cada uno de los dos comienzos)

S olution vector STARTING SET (1) ITERATION (Iteracion)\ (solucion) (Comienzo) 0 1 2 : 3 : 4 :: s , a 1.036 1.022 1.039 : 1.039 : 1.039 :

b 0.524 0.967 1.036 : 1.039 ; 1.039 ;: B2 a 1.312 1.062 1.085 : 1.086 : 1.086 :

b 0.627 1.003 1.081 : 1.086 : 1.086 :: b3 a -0.775 -0.586 -0.603 :-0.603 |-0.603 :

b -0.332 -0.538 -0.600 ;-0.603 ;-0 . 6 0 3 ;

: B4 a -0.092 -0.252 -0.248 :-0.248 :-0.248 :b -0.085 -0.218 -0.247 .--0.248 :-0.248 :

; si a -0.059 -0.007 -0.007 :-0.007 :-0.007 :b -0.025 -0.008 -0.007 ‘.-0.007 ; - o . o o 7 ;

: S 2 a 0.070 0.016 0.016 : 0.016 :-0.016 :b 0.035 0.016 0.016 : 0.016 : 0.016 :

! S a -0.001 -0.005 -0.005 :-0.005 :-0.005 :b -0.004 -0.005 -0.005 ;-0.005 ;-0 . 0 0 5 ;

: S4 a -0.009 -0.003 -0.003 :-0.003 :-0.003 :b -0.006 -0.003 -0.003 :-0.003 :-0.003 :

: e a - 0.127 0.012 : 0.001 <10"4 :b “ 0.223 0.044 : 0.002 <1CT4 :

(1) empirical logits ln[(n.^+X)/(n. -n. +A)J , with

A= 0.1, 0.5in(a) and (b) respectively.

Page 7: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

table is

where H. is the effect of the j hen kth age^of dam (k=1 ,2 for heifers and the l™ sex (1=1 ,2 for males and fema

si re <m=1 ,2,3,4). In this

B* = fH. H_ A. A, T. T_£ Ln 1 2 1 2 1 2

ii'% = ( s 1 S2 s3 s4

Further, we define the vectorf >HiyV

' / \ 6i B2

A2- A1 83Tr T2v. > 84

so that the matrix £ will have full-col between cows and heifers, and T^-Tg females. Hence , 1 S>

'1 0 -1 11 0 -1 -11 0 1 11 0 -1 -11 0 1 11 0 1 -11 0 -1 11 0 1 11 0 1 -10 1 -1 10 1 -1 -10 1 1 10 1 -1 -10 1 1 10 1 -1 -10 1 1 10 1 -1 10 1 -1 -10 1 1 10 1 1 -1

Assuming vague prior information about

Var rSiS-s-s.!' = IU 2 3 W ' v s

ar (j=1,2), A. is the effect of the s respective ly>,T. is the effect of respectively) ana sm is the effect , we wri te

A. ‘ fV\ i •+ • — ~~

J, ( A ' ' ") ( "rank ; A2“A is the difference

difference between males and

r1 0 0 01 0 0 01 0 0 00 1 0 00 1 0 00 1 0 00 0 1 00 0 1 00 0 1 01 0 0 01 0 0 01 0 0 00 1 0 00 1 0 00 0 1 00 0 1 00 0 0 10 0 0 10 0 0 10 0 0 1

and

299

Page 8: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

we can write (13). Since the heritabi lity of calf mortality in a 0-1 scale is about 5%, under'the assumptions made in this data set, we can take as herita- bility in the underlying scale (Dempster and Lernsr, 1950) the quantity

, 2 _ .05 x .71 x (1-.71)h “ 2 (.342)

.088

Hence, in (13) t=44.45

Start of iteration - Two different sets of starting values and were used to iterate with (13). These were the solutions to

f t " " !-------1

IIi

X'Z - 1 X’w .

Z'Z + tl Z'w .% 'Vz <V, <\,1

; i=a,b

were w = {In f(n.,+.1)/(n. -n..+.1)]} , and"va L i1 i. 11 *

w = {in f(n.,+.5)/ (n. -n.,+.5)l} are vectors of 20x1 modified ,vb l i1 i . 11 >empirical logits. Note that the two solutions are related to unweighted versions of the method proposed by Gianola (1980). The criterion e =/l' ) / (p+q) <10“ was used to stop iteration.

Results are presented in Table 2. Despite the sparseness of the contingency table, convergence was attained in four rounds for each of the two starting sets. Sire solutions stabilized in one round, while it took two rounds for 8 to "converge" at three-decimal points accuracy. For practical purposes, iteration could have stopped in two rounds, irrespective of the starting set.

To further examine the behavior of (13), a random vector, ^c, containing integer numbers between -5 and 5, was generated. This was used in the preceding equations to obtain starting values. Also, a set of starting values was obtained from BLUP of raw scores (Schaeffer and Wilton, 1976). In both instances, convergence was attained in 4 rounds, with exactly the same final solution as in Table 2. It appears that the algorithm approaches the solution to (13), very rapidly from almost everywhere.

Cell probabilities were estimated using (4) and the results are in Table 3. As expected (Leonard, 1972 ; Lindley and Smith, 1972) the probabili­ties were shrunk towards,the overall proportion of live calves (71%) as all data contribute to each P...

11 *Several linear combinations of elements of [B'u'J and their

posterior precision were considered. The square root of ^he posterior disper­sion is in parentheses :

300

Page 9: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

TABLE 3ESTIMATED PROBABILITY OF A LIVE CALF AT BIRTH, BY CELL (Probabi Lidad estimada de supervi vencia del ternero, por celda)

CELL V - VARIATEPROBABILITY

(ProbabilidadOF RESPONSE de respuesta)

(celda) (Variable ) OBSERVED (Observada) ESTIMATED (Estimada)

1 1.387 1.00 .802 1.883 1.00 .873 0.181 0.00 .554 1.906 1.00 .875 0.204 1.00 .556 0.700 0.67 .677 1.389 0.50 .808 0.183 1.00 .559 0.679 0.00 .6610 1.434 1.00 .8111 1.930 1.00 .8712 0.227 0.00 .5613 1.953 0.50 .8814 0.251 1.00 .5615 1.931 1.00 .8716 0.229 1.00 .5617 1.438 1.00 .8118 1.934 1.00 .8719 0.231 0.00 .5620 0.727 1.00 .67

- .046 (.884)- .603 (.477)

r siS'-S,

- .248 (.448)- .023 (.209)

|Y, a2) =l#v/ u- .002 (.209)1 3

S1 S4 - .004 (.2 1 0).021 (.209)

S2_S4 .019 (.2 1 0)S3 s4 - .002 (.2 1 0)

_

The comparison indicates that in this data set, the under­lying ability of older cows to produce a live calf at birth would be lower although not significantly so than that of young cows. Hence, the probability of a live calf at birth would be higher in the heifers. As discussed before, this is an artifact of the example chosen by Schaeffer and Wilton (1976). Likewise, in this data set 9/14 male calves were alive compared with 11/14 in females ; this resulted in a negative value for the comparison Ti~T2 - Among the four sires, sj had the highest genetic ability to sire live calves.

2For this data, X . = 17.28, with 16 degrees of freedom. Hence, there is no reason to reject the hypothesis .that the model fits to the data of the example,P ( $ 6 > 17.28=0.37). Given the sparsity of the contingency table, the approximation of (17) to a chi-square statistic may be poor.

301

Page 10: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

IV - CONCLUSION

The method presented in this paper - with appropriate extensions - presents a general solution to the problem of analysis of categorical data in animal breeding. Important numerical problems arise with large data sets and these will be adressed in future communications.

SUMMARY

The general linear model presents problems when applied to pre­diction of random variables with categorical expression. Traits of importance in animal breeding, such as calving difficulty, have a discrete distribution, so it is critical to develop adequate methodology. We assume an underlying liability variate which is a linear combination of jointly normal vectors S and u the latter representing genetic values. The likelihood function is product feinomial i.e., the expression of the trait is a Bernoulli trial with a parameter dependent on J3=B* and u=u*. If the loss is quadratic, the Bayes estimator is the a-posteriori mean or u* (Y5 where Y specifies the configuration of a con­tingency table. We replcce^the normal by a logistic distribution and maximize the posterior density with respect to and ' as an approximation to the Bayes estimator. This yields a non-linear system winch can be solved by re-weighting a set of "mixed-model" equations linear in the corrections at every-iterate. The procedure provides a prediction of the underlying liability and an estimate of the probability "at risk".

RESUMEN

El modelo lineal general presenta problemas cuando se le aplica a prediccion de variables aleatorias con expresidn categdrica. Caracteres de im- portancia en mejoramiento animal, tales como la dificultad al parto, presentan una distribucion discreta, de forma que es cntico desarrollar una metodologia adecuada. Se asume una variable subyacente, combinacidn lineal de dos vectores conjuntamente normales : £ y jj, este ultimo representando valores genSticos.La funcion de verosimi litud es producto binomial, es decir, la expresion de la variable es Bernoulli con pardmetro dependiente de B=fi* y y-u*. Si la perdida es cuadra'ti ca, el estimador de Bayes es la media a-posteriori, o E(j3‘ u'|Y), donde Y especifica la configuracion de una tabla de contingencia. Se reemplaza la oistribucidn normal por la logistica y se maximiza la densidad posterior con respecto a £* y u*. Resulta un sistema no-lineal que se puede resolver re-pon- derando una serie de ecuaciones de "modelo mixto" lineal en las correcciones en cada iteracion. El procedimiento provee una prediccion de la variable subya­cente y una estimacion de la probabilidad de riesgo.

REFERENCES

ASHTON, W.D. (1972) : The Logit Transformation. Hafner Publishing Co.,New York.

BOCK, R.D. (1975) : Multivariate Statistical Methods in Behavioral. Research.Me Graw-Hill Book Co., New York.

BOCK, R.D. and JONES, L.V. (1968) : The Measurement and Prediction of Judgement and choice. Holden-Day, San Francisco.

302

Page 11: BINARY EXPRESSION : AN EMPIRICAL BAYES APPROACH. SY-2-30

COCHRAN, W.G. (1951) : Improvement by means of selection. Proc. 2nd Berkeley Symp. Math. Stat. and Prob., 449-470.

DAHLQUIST, G. and BJ0RCK,A.(1974) : Numerical Methods. Prentice Hall, Englewood Cliffs.

DEMPFLE, L. (1977) : Relation entre BLUP (Best Linear Unbiased Prediction) et estimateurs bay6siens. Ann. G6n6t. Sel. anim., 9 : 27-32.

DEMPSTER, E.R. and LERNER, I.M. (1950) : Heritability of threshold characters. Genetics, 35 : 212-235.

GIANOLA, D. (1980a) : A. method of sire evaluation for dichotomies. J. Anim. Sci., 51 : 1266-1271.

GIANOLA, D. (1980b) : Genetic evaluation of animals for, traits with categorical responses. J. Anim. Sci., 51 : 1272-1276.

GUMBEL, E.J. (1961) : Bivariate logistic distributions. J. Amer. Statist. Assoc., 56 : 335-349.

HENDERSON, C.R. (1973) : Sire evaluation and genetic trends. _In Proc. ofAnimal Breeding and Genetics Symp. in honor df Dr J.L. Lush, Amer. Soc. of Anim. Sci. and Amer. Dairy Sci. Assoc., Champaign, III.

HENDERSON, C.R. (1975) : Best linear unbiased estimation and prediction under a selection model. Biometrics, 31 : 423-447.

JOHNSON, N.L. and KOTZ, S. (1970) : Distributions in Statistics. Vol.2,Continuous Univariate Distributions. Houghton Mifflin Co., Boston.

LEONARD, T. (1972) : Bayesian methods for binomial data. Biometrika, 59 : 581-589.

LINDLEY, D .V . and SMITH, A.F.M. (1972) : Bayes estimates for the linear model. J. Roy Stat. Soc. B, 24 : 1-41.

MALEC0T, G. (1969) : The Mathematics of Heredity. W.H. Freeman and Co.,San Francisco.

QUAAS, R.L. and VAN VLECK, L.D. (1980) : Categorical trait sire evaluation by best linear unbiased prediction of future progeny category frequen­cies. Biometrics, 36 : 117-122.

RONNINGEN, K. (1971) : Some properties of the selection index derived by"Henderson's Mixed Model Method". Z. Tierz. Zuchtbiol., 88 : 186-193

SCHAEFFER, L.R; and WILTON, J.W. (1976) : Methods of sire evaluation for cal­ving ease. J. Dairy Sci .,59 : 544-551.

THOMPSON, R. (1979) : Sire evaluation. Biometrics, 35 : 339-353;

VAN VLECK, L.D. and KARNER, P.J. (1980) : Sire evaluation by best linearunbiased prediction for categorically scored type traits. J. Dairy Sci., 63 : 1328-1333.

WRIGHT, S. (1922) : Coefficients of inbreeding and relationship. Am. Nat.,56 : 330-338.

303