IBD- based QTL detection in multi-cross inbred designs: A...

1

IBD- based QTL detection in multi-cross inbred designs: A case study of

cereal breeding programs

Sébastien Crepieux*,1, Bertrand Servin† , Claude Lebreton‡ and Gilles Charmet*

*UMR 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand Cedex 2, France,

†INRA UMR de Génétique Végétale, INRA/UPS/INAPG, 91 190 Gif sur Yvette, France

‡Limagrain Agro-Industrie, site d’ULICE, av G. Gershwin, BP173, F-63204 Riom Cedex,

France

2

IBD-based multi-cross QTL mapping

1 : corresponding author : 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand

Cedex 2, France. Email : [email protected]

Phone : 00 33 4 73 62 43 09 , Fax: 00 33 4 73 62 44 53

Key words: QTL detection, variance component, pedigree breeding, IBD, multi-cross

ABSTRACT

Mapping quantitative trait loci in plants is usually conducted using a population derived from

a cross between two inbred lines. The power of such QTL detection and the parameter

estimates highly depend on the choice of the two parental lines. Thus, the QTL detected in

such populations only represent a small part of the genetic architecture of the trait. Besides,

the effects of only two alleles are characterised, which is of limited interest to the breeder. On

the other hand, common pedigree breeding material remains unexploited for QTL mapping. In

this study, we extend QTL mapping methodology to a generalized framework, based on a

two-step IBD variance component approach, applicable to any type of breeding population

coming from inbred parents. The power and accuracy of this method were assessed on

simulated data mimicking conventional breeding programs in cereals. This method can

provide an alternative to the development of specifically designed recombinant population, by

exploiting the genetic variation actually managed by plant breeders. The use of these detected

QTL in assisting breeding would thus be facilitated.

3

INTRODUCTION

The availability of molecular markers in the 1980’s has opened new scope for quantitative

genetics and breeding. It was thus anticipated that the manipulation of loci underlying

quantitative traits (QTL) would be as easily feasible as with mendelian factors. This

perspective, however, has remained largely unreached, despite the large corpus of theoretical

studies on marker assisted selection ( e.g. LANDE and THOMPSON 1990; GIMELFARB and

LANDE 1994 , 1995 ; HOSPITAL et al. 1997). The main reason is probably the cost of markers

and the relatively low improvement in selection efficiency that leads MAS to be generally

much more expensive than conventional breeding (MOREAU et al. 2000). The other reason is

that applied breeding programs and QTL research are often disconnected, i.e. carried out by

different teams and using different plant material.

Classically, QTL analyses are carried out on a few progenies from broad base crosses, coming

from a small number of distantly related lines, often including wild relatives. Such analyses

mostly involve bi-parental progenies such as back-crosses (BC), doubled haploids lines (DH),

F2 or recombinant inbred lines (RILs). In the approaches based on this kind of plant material,

the effect of an allele substitution at a candidate locus is tested. This is called the fixed model

approach (XU and ATCHLEY 1995) since it considers a fixed number of distinct alleles (most

often two) at each putative QTL. Statistical methods for the QTL analysis of bi-parental

populations underwent successive improvements through the advent of Interval Mapping

(LANDER and BOTSTEIN 1989) and its linearization (HALEY and KNOTT 1992), the Composite

Interval Mapping (ZENG 1993, 1994 ; JANSEN 1993) and multiple trait QTL mapping (KOROL

et al. 1995; JIANG and ZENG).

On the other hand, the breeder’s material is far from the studied bi-parental populations.

Breeders generally handle many small families from crosses between (often) highly related

elite lines. Thus, the above described methods are poorly adapted. Moreover there are many

4

drawbacks for the breeder’s use of the QTL found on bi-parental populations. First, when

only two parents are considered, some markers and potential QTLs are more likely to be

monomorphic, even if parental lines are carefully selected for trait divergence. Since, by

definition, QTL can only be found at polymorphic sites in the genome, the expected number

of QTL detected with a bi-parental cross will be lower than that expected when analyzing

several crosses at a time (assuming the total number of genotypes is not the limiting factor).

The second drawback is that the QTL effect is estimated as a contrast between two alleles and

in one genetic background only. Therefore, in that context, the improvement of a line by the

introgression of a QTL allele in a completely new genetic background is rather unpredictable,

because of possible epistatic interaction between QTL and genetic background. Finally, from

an economic standpoint, the cost of creation of large single cross progenies and specific trials

for trait evaluation to perform QTL detection is quite high and often at the expense of other

selection programs.

All these drawbacks reduce the breeders’ interest for implementing such experimental designs

when funding and work are constrained. Bi-parental crosses are usually preferred for more

upstream studies, e.g. genomics: the fine-mapping of a QTL, which is a pre-requisite for its

positional cloning, is easier when fewer QTLs are segregating. In contrast, the breeders’ focus

will be to characterize the effect of a wide range of alleles in his germplasm. Methods for

simultaneous detection and manipulation of QTL in breeding programs would thus enhance

the applicability of MAS.

In plant breeding, new methods for QTL detection in complex designs, close to those used in

real breeding schemes have already been developed. MURANTY (1996) suggested to work in

plants with progenies from several parents, in order to achieve a high probability to have more

than one allele at a putative QTL, and also to have a more representative estimate of the

variance accounted for by a QTL. She operated in a fixed-effect framework. Simulations

5

demonstrated that a higher QTL detection power was achieved, for a given sample size. XU

(1998) compared the QTL detection powers obtained with random effect models and fixed

effects and found similar values for individual family sizes as low as 25 individuals.

However, in more unbalanced designs, the random effect approach was presumed to be more

suited as it can handle any arbitrary pedigree of individuals (LYNCH and WALSH 1998, XU

1998). Efficient methodologies for more fragmented populations in plants have been

developed (XIE et al. 1998; YI and XU 2001, BINK et al. 2002, JANSEN et al. 2003 for

example), but their extension or implementation for any complex plant designs, implying a

mixing of half-sibs and full-sibs families of different sizes, at any generation of selfing and

with the hermaphrodite status of parents, is not straightforward. The identical-by-descent

(IBD)-based variance component analysis is a powerful statistical method for QTL mapping

in complex populations and can be used in pedigrees of arbitrary size and complexity

(ALMASY and BLANGERO 1998). These IBD-based variance component analyses are derived

from the assumption that individuals of similar phenotype are more likely to share alleles that

are identical by descent. The construction of IBD matrices for alleles at each tested position

along the genome, and the fitting of random effect models (which assumes that QTL effects

are normally distributed) offer an appropriate method to map QTL if the progeny population

is large enough and if the progenies are connected in some way. Besides, these models do not

need to assume a known, finite set of alleles at each putative QTL. Thus, they offer a less

parameterized statistical environment in which to map QTL, because only the variances need

to be estimated instead of every allele substitution effect. Generally, the IBD-based

approaches assume a between family IBD-likelihood of zero (i.e. no parents in common

between the two families), and thus, consider the parents as founders. However, this

assumption is often wrong in common breeding pedigrees. Furthermore, in fragmented

situations, i.e. where there are many families of small sizes (especially when the genotyping

6

takes place at a late stage in pedigree breeding, where we may easily end up with as few as

one or two lines per cross), the IBD-likelihood matrix can be very sparse. Hence, there could

be much to be gained in exploring the actual between family IBD-likelihoods.

In the work related in this paper, we took over these developments and further assumed a non-

zero IBD-likelihood between non-sib lines. We then present a unified IBD-based variance

component analysis framework, to map QTL in any kind of multi-cross designs involving

self-pollinating species, at any generation. To test the accuracy of the method, we developed a

simulation program which mimics the steps of real breeding schemes. We chose the

simulation parameters according to the information provided by breeders on their real

material in order to be as comprehensive as possible in the range of genetic configurations

explored.

This method can provide an alternative to the development of specifically designed

recombinant population, by exploiting the genetic variation actually managed by plant

breeders. Our method can thus provide breeders with valuable information about the breeding

values of their material and help them to design selection strategies.

METHODS

Two-step IBD based variance component method

The method used to map QTL in a complex inbred pedigree is a two step variance component

method, as described in GEORGE et al. (2000). Hence, this method first consists in

constructing the (co)variance matrix of fixed and random effects at each putative QTL

position and then estimates the likelihood of the presence of a QTL at these positions using

appropriate linear models.

7

These two steps are common to all interval mapping based variance components methods.

What differs mainly among all the published methods is the way to calculate the IBD

probabilities (see GEORGE et al. 2000 for a review of IBD probabilities calculation). We

adopted a deterministic approach to infer IBD probabilities for any generation of

recombination and breeding scheme (e.g. F2, Fn, RIL, BCn), based on the MDM program

(SERVIN et al. 2002).

Mixed linear models

We assume that the quantitative trait is a linear combination of fixed design effects, a putative

QTL effect (with additive or/and dominance effect) and additive polygenic effects. The

random polygenic effect is seen as the cumulative effect of all loci affecting the quantitative

trait that are unlinked to the QTL. The model is:

)1(eZvZuXy +++= β

where y is an (m*1) vector of phenotypes, X is an (m*s) design matrix, β is a (s*1) vector of

fixed effects, Z is an (m*q) incidence matrix relating records to individuals, u is a (q*1) vector

of additive QTL effects, v is a (q*1) vector of additive polygenic effects and e is the residual.

We assume that the random effects u, v and e are uncorrelated and distributed as multivariate

normal densities: ),0(~;),0(~;),0(~ 222 evu INeANvGNu σσσ , with 222 , evu and σσσ being

respectively the additive variance of the QTL, the polygenic variance and the residual

variance. A is the (q*q) additive genetic relationship matrix; G is the (q*q) (co)variance

matrix for the QTL additive effects conditional on marker information; and I is the (m*m)

identity matrix.

The model without QTL segregating in the population is, with the same notations:

)2(eZvXy ++= β

8

Computation of the IBD probabilities

We consider a mapping population composed of several sub-populations of small size. Each

of these sub-populations is an offspring coming from an inbred pedigree started with two

parents. For example, these sub-populations could be produced by several consecutive

selfings (e.g. RILs) or back-crossings. We want to compute the probability that two

individuals taken from any of these sub-populations share IBD alleles at a given locus of their

genome. If we consider a pair of individuals from the mapping population, they may be (i)

taken from the same sub-population, in which case they are full-sibs (ii) taken from two

different sub-populations. In this last case, if one of the parents is common to the two sub-

populations, the two individuals will be half-sibs, if the parents of the two sub-populations are

distinct, the two individuals are considered as unrelated. We will now draw relevant

calculations of the IBD probabilities for each of these cases.

IBD value between two sibs at a QTL: Within each full-sib family of the breeding scheme,

only two alleles are segregating giving only three possible genotypes at the QTL: QQ, Qq and

qq.

Following XIE et al. (1998) notations, the IBD value of two individuals i and j, is measured

as:

QqQqorQqqqqqqqor

qqQQQqQQQQQQ

forforfor

jiji −−−

−−−

== ,

,,,

012

2 ,, θπ

where ji,π are the ijth elements of G, and ji,θ is MALECOT’s (1948) coefficient of coancestry.

As pointed out by many authors, when inbreeding is present, ji,π is not interpreted as the

proportion of alleles IBD, but rather as twice the coefficient of coancestry (KEMPTHORNE

1955; HARRIS 1964; COCKERHAM 1983).

9

Inferring the IBD value of a QTL from markers: The IBD value is completely determined

by the genotypes of two individuals at the QTL of interest. The actual QTL genotype of an

individual, however, is not observable, and must be inferred from flanking marker

information. We denote the following probabilities:

)/Pr(),/Pr(),/Pr( 012 MjMjMj IqqpandIQqpIQQp === .

We write [ ] Tiiii pppp 012= and [ ]Tjjjj pppp 012= . The conditional expectations of the

IBD values between two full sibs are:

jTiMjiji CppIE == )/( ,, ππ for between individuals,

and jT

Mjiji pcIE == )/( ,, ππ for the individual with itself, where:

=

=

121

210111012

candC

In the rest of the paper, this formula, for computing IBD probabilities, will be referred to as

formula (1).

General case: Above, we emphasized the calculation of IBD for the full-sib case. The

generalization to the half-sib case is trivial.

Using this first formula to compute IBD probabilities, we assumed that parents of sub-

populations were unrelated, i.e. they did not share any common ancestors. However, in

practice, these parents were coming from previous generations of breeding and were very

likely to share IBD alleles in their genome (due to the intensive use of some “star varieties”,

for example). In order to take these possible relationships between parents into account, we

estimated coancestry coefficients between them, using molecular information, as suggested by

BERNARDO (1993). The pedigree information, if available, could be used to that end.

However, the selection pressure may generate a discrepancy in the predicted proportion of

parental genomes shared by the current lines. Besides, we assumed that the pedigree

10

information was often very scarce or even unavailable. Hence, we resorted to a genetic

similarity based index to estimate that proportion of genome.

First, we generalized the C matrix to the known half-sib, full-sib and unrelated individuals by

introducing the coancestries between parents, estimated by markers in our case. We

considered two individuals taken in two sub-populations. Let P1 and P2 be the parents of the

first sub-population and P3 and P4, the parents of the other sub-population. We denote by

GSP1P3, GSP1P4, GSP2P3 and GSP2P4 the estimates of the coefficients of coancestry between

these parents. Taking into account possible coancestries between P1 and P3 on one hand and

P2 and P4 on the other hand, the C matrix can then be re-written as:

+=

4242

42423121

31

3131

1

20)(

02

PPPP

PPPPPPPP

PPPP

GSGSGSGSGSGS

GSGSC

Note that in the full-sib case P1=P3 and P2=P4, so that GSP1P3 and GSP2P4 are equal to one

and the C1 matrix is similar to C. Similarly, the relevant C matrices for half-sibs individual or

for unrelated individuals can be obtained by replacing respectively GSP1P3 by one and GSP2P4

by zero on one hand, and both GSP1P3 and GSP2P4 by zero on the other hand.

Similarly, taking into account the coefficients between the parents P1 and P4 on one hand and

P2 and P3 on the other hand, we can re-write the C matrix as :

+=

02)(21

20

4141

32413241

3232

2

PPPP

PPPPPPPP

PPPP

GSGSGSGSGSGSGSGS

C

Finally, we can draw a general formula for the conditional expectation of the IBD values

between two individuals coming from four (distinct or not) inbred parents:

jTij

TiMjiji pCppCpIE 21,, )/( +== ππ

11

))])([()])([(

)])([()])([((2)/(..

121

0121

042121

2121

032

121

0121

241121

2121

231,,

iijjPPiijjPP

iijjPPiijjPPMjiji

ppppGSppppGS

ppppGSppppGSIEei

++++++

+++++== ππ

The conditional expectation of the IBD for an individual with itself remains:

12,, 2)/( jjMjiji ppIE +== ππ

In the rest of the paper, this formula, using the C1 and C2 matrices, will be referred to as

formula (2).

The elements in the additive relationship matrix A are estimates of the proportion of IBD

genome between any two lines. The ai,j elements is the proportion of IBD genome between

line i and line j, based on genetic similarities.

For both A and G matrices, genetic similarities (GSi,j) were computed using NEI and LI

(1979) formula.

Implementation of the IBD formula

We used the deterministic approach of the MDM program (SERVIN et al. 2002) to compute all

the pi and pj probabilities. IBD-likelihoods were computed every 3cM. Two flanking markers

were used to infer the genotypes probabilities. In the frequent case where the two parents

shared the same marker alleles at one or two loci flanking the putative QTL position, the next

closest markers to the interval were used. It can easily be demonstrated that the IBD

probabilities calculated at a putative QTL will be more precise if the flanking markers are

highly polymorphic. Issues on the use of better estimates of the coefficients of co-ancestries

will be discussed on the last part of the article.

All the G matrices were then inverted, and written in ASREML (GILMOUR et al. 1998) format

for user-defined inverse (co)variance matrices. We also computed the additive relationship

matrix A. Then, it was inverted and written in ASREML format (end of step1).

In step 2, ASREML provided restricted maximum-likelihood (REML) estimates of (1) and

(2). To test for the presence of a QTL against no QTL at a particular chromosomal position,

12

we used the Log Likelihood Ratio test: LR= -2ln( L0(H0, no QTL present) – L1(H1, QTL

present) ), where L1 and L0 represent the likelihood values of (1) and (2) evaluated at the

REML solutions, respectively.

Test statistic under the null hypothesis

The choice of the threshold in this kind of population is always challenging. Many

publications (ZENG 1994; XU and ATCHLEY 1995 for example) report that when a

chromosomal interval is being scanned, the empirical distribution of LR follows a mixture of

two Chi-square distributions, with one and two degrees of freedom, respectively.

Since this article deals with simulated data, it is possible to replicate data under the null

hypothesis of no QTL segregating, construct the empirical distribution of LR and derive

empirical threshold by choosing the 95th percentile of the highest test statistic, generally over

500 or 1000 stochastic realizations. In this paper, we calculated an empirical threshold for

every set of parameters to see whether significant differences on threshold appeared or not.

Hence, for each set of parameters tested, we ran 1000 additional simulations with no QTL

segregating. We increased the polygenic variance such that the total genetic variance

remained unchanged. We thus determined the empirical threshold by choosing the 95th

percentile from the list of 1000 runs.

A SIMULATION STUDY: The case of a cereal breeding program

We chose the cereal example in the simulation study as it contains most of the difficulties

generally encountered in inbred breeding schemes:

- the frequent unavailability of reliable pedigree information, beyond the parents (and

thus unavailability of ancestor lines to genotyping)

- the potential to genotype only advanced generations of selfing, when the number of

lines has been narrowed down and the trial precision of trials increased, constraining

13

to compute IBD at the F6 or F7 stage without any marker information between the

initial cross and the resulting progenies

- the very high number of parents of the mapping population yielding very small full-sib

families, and an uneven (L shaped) distribution of half-sib family sizes

Simulation of the inbred breeding scheme

Every year, new plant breeding programs are started. The choice of the numerous parents for

crosses combines on one hand lines with the highest genetic merit for some traits of interest

like yield or quality, and on the other hand, some lines of specific interest such as special

quality or pest/disease resistance, sometimes taken in old or exotic material. The original

crossing scheme in a breeding program is very dependent on the breeder for the choice of the

parents, but the breeding process is often closely the same.

To reproduce the steps of the breeding programs, an S-PLUS (2000) function was developed.

This function was designed by analysing wheat breeders information and allele diversity and

frequencies made available by recent studies (DONINI et al. 2000; RODER et al. 2002).

All simulations follow a similar procedure: (Figure 1)

Figure 1 around here

Marker-genotypes construction: Before the start of the breeding process, we considered only

a few founder lines, with full linkage disequilibrium across all their genome, hence between

all the markers and the QTLs. We also imposed that at this generation, no founder lines had

any alleles in common with any other. Thus, line 1 carried only markers and QTLs coded 1,

line 2 only markers and QTLs coded 2, and the last founder line (the NPth) carried only

markers and QTLs coded NP all along its genome. The genome of simulated individuals was

composed of 21 chromosomes of 100 cM, with markers evenly spaced every d centiMorgans

14

all along the chromosomes, with two markers on each chromosome telomeres. On

chromosome 1, we simulated a single QTL between two markers. In addition to the QTL of

interest, we simulated on the other chromosomes Npoly=40 randomly located QTLs (that

could therefore be linked or not) with random effects to simulate the polygenic contribution to

the trait values.

First generation of crosses: From the founder lines generation, circular crosses, i.e. 1*2, 2*3,

… , NP-1*NP, NP*1 were performed. We then derived lines by self-pollination during five

generations in order to obtain F6 lines. NP mixed sub-populations of the same size (same

contribution for all founders for this first generation) were derived, giving the “G0”

generation. The population was considered as totally fixed, by sampling only one gamete after

the last self-pollination. This stage corresponded to the end of a breeding cycle. The

recombination procedure was based on randomly placed chiasmatas with no interference.

Quantitative trait: The parameters for the creation of the quantitative trait are the QTL

heritability (h²QTL) and the heritability of the polygenes (h²poly).

We created at the founder line generation the NP allele effects for the QTL and the Npoly

polygenes. The NP=20 possible effects of the QTL were drawn from a normal distribution

with mean 0 and variance 1. Then the QTL variance (VarQTL) was calculated at the true QTL

position, and the NP=20 effects for each of the Npoly polygenes were extracted from a normal

law with mean 0 and variance [VarQTL*(h²poly/h²QTL)]/Npoly]. Finally, the true variance

accounted by the polygenes was computed (VarPoly), and a random normally distributed noise

with variance 2eσ = [VarQTL *(1/h²QTL-1) - VarPoly ] was added to simulate phenotypic values

of the trait. Thus, the ratio of the additive variance explained by the QTL on the total

phenotypic variance is exactly equal to the specified value h²QTL while the ratio of polygenic

QTL on the total phenotypic variance could be slightly different from the specified h²poly.

Hence, the allele effects and the environmental variance were created at the first generation,

15

and remained constant for all the generations even if the number of alleles decreased.

Nevertheless, the environmental variance was adjusted at the last generation to set the desired

QTL heritability before performing the QTL detection.

Overlapping generations and matrix of crosses: When the genotype and phenotype of the

lines were obtained, virtual breeding schemes have been performed. Two hypotheses were

implemented by extrapolating information obtained by breeders:

- The “overlapping” choice of the parents. All the parents were not necessarily extracted from

the last generation only, but a proportion of them (parameter) could originate from older ones

- The influence of a matrix of crosses on the structure of the resulting progeny of a cross

breeding program, which really influenced the effective population size

The design of crosses at the beginning of a breeding scheme could be seen as a geometric

series, since the representation of parents in the selected progeny is uneven, L-shaped rather

than random. For example, if a given line, say X, is the most cultivated line at a given period

(with the best agronomic performance in a range of environments), X will usually be crossed

to many other lines to fully exploit its genetic value. After self-pollination and selection, a

certain number of lines coming from this parent X will still remain at the F6 stage, and will

form one of the largest half-sib families of all the breeding scheme (containing possibly some

full-sibs when a specific cross is particularly outstanding). On the other hand, an exotic plant

with a really focused interest but with low agronomic performance may also be used to

initiate crosses, but at a smaller scale. Some of its offspring will also probably be selected but

at a much smaller extent.

A matrix of crosses was implemented to reproduce the formation of half-sib and full-sib

families during a cycle of breeding, hence taking into account unbalanced contributions of

parents to the final population. This matrix was filled only upper diagonal, with the parents

sorted from the best ones to the exotic ones. The choice and order of the parents to fill the

16

matrix were based on their anteriority. We considered that the best agronomic ones came from

the closer generations of breeding, and that the exotic ones came from older generations of

breeding. The “overlapping” option extracted 80% of parents from the closest generation of

breeding, and 20% from all the older generations (accounting for 10% of the resulting

progeny). On average, 285 crosses were performed per simulated selection cycle (out of 4750

possible crosses in the full matrix of crosses). Note that each cross gave, on average, 1.75 full-

sibs and that each parent was found, on average, in ten progenies. Thus, each individual is

related, on average, to 8.25 half-sibs and related to 0.75 full-sib.

The resulting half-sib families sizes are presented in figure 2.


Performing n breeding cycles: After the first generation, a loop on the number of breeding

cycles was performed and the parents of crosses, the resulting progenies and the phenotypic

data were stored. All available generations were used to build the next. The last breeding

cycle, NG, was used for QTL detection.

Note that at the beginning, all the allele frequencies were equal, which was not the case after

many generations due to genetic drift and non-panmictic conditions. NP alleles with different

effects were possible at each QTL locus and at each marker. All the markers and QTL were in

full linkage disequilibrium at G0 but were not after many recombinations (5 generations of

self-pollination by cycle, and NG cycles before the mapping generation).

Thus, it is easy to anticipate that the information carried by markers and QTL will be different

in many cases, and that ANOVA based method are likely to be unefficient.

As the goal of this article is not to anticipate the influence of selection on the QTL detection,

we did not performed recurrent pedigree selection on the value of the quantitative trait for the

17

choice of the parents, or intra-cross-selection during the self pollination process. We just

wanted to anticipate the influence of the structure of the population and the effect of the

design of crosses and breeding cycles for a non-selected trait, at non-selected locus.

Design of simulations:

Standard setting: as proposed by XU (1998), instead of performing simulations on a factorial

design of every possible parameter combinations, we chose a standard setting for each

parameter. The conditions of this standard setting were 21 chromosomes of length 100 cM

each with 11 markers spaced every 10 cM, a QTL segregating at position 45 on chromosome

1 with heritability 0.1. Npoly=40 polygenic QTL were randomly placed on the other

chromosomes setting a total genetic heritability (h²g) of 0.435 (with standard deviation 0.051

among the 100 repetitions). For each cycle, about 500 individuals were created at the F6 stage

coming from 100 parents. The studied generation came from the 10th cycle of breeding,

having considered NP=20 founder lines at the beginning of selection.

Comparison of IBD formula (1) and IBD formula (2) for three levels of QTL heritability:

Both formulae were tested for the same set of 100 independent replicates. Three different

quantitative traits per replicate were created at the last generation of breeding scheme with

QTL heritabilities 0.05, 0.1 and 0.2. Total genetic heritability (h²g) was initially set to 0.5 but

after 10 generations of crosses (and genetic drift), some stochastic differences between

replicates appeared. Thus, h²g values are 0.427 (0.063 standard deviation among 100

replicates) for h²QTL=0.05, 0.435 (0.051) for h²QTL=0.1, and 0.456 (0.047) for h²QTL=0.2.

Different experimental conditions: We first tested the robustness of formula (2) for different

marker bias conditions. To avoid stochastic differences between the tested conditions, the 100

same replicates from the standard setting were used before adding bias to the marker

information. Thus, differences between the results within those conditions can be directly

18

attributed to this added bias on marker information. Created biases included (1) missing

marker information (NA) (10% of randomly placed NA in the progenies), (2) portion of non-

IBD Alike-In-State (AIS) alleles (a portion of allele codes were changed to other existing

allele codes in parents and progeny files to obtain 25% of AIS (i.e. not IBD) alleles in the

whole genome).

We then varied out parameters around the standard setting, changing only one parameter at a

time: these parameters include (1) the total number of founder lines in the base population

(NP: 40 vs 20); (2) the number of breeding cycles (NG: 20 vs 10) ; (3) the marker density (d:

20 vs 10) ; (4) the number of lines at the mapping generation (n: 250 vs 500) and number of

parents (m: 50 vs 100)

We also tested the 20 cM marker density condition (3) with 25% of AIS alleles in order to

anticipate the effect of this added bias at a lower marker density. Formula (2) was used for the

analysis of all these variations, except for the NG=20 case, where we tested both formulae.

Some parameters remained constant for all the simulations: the total number of chromosomes

(21), the number of polygenic QTL (40), the position of the QTL on chromosome 1 (45 cM

except for the d=20cM case where the QTL position is 50 cM), the total genetic heritability

originally set to 0.5.

RESULTS

We report the results of the two step variance component analysis on the different settings

below. Due to computer constraints, only every third centiMorgan was tested for the presence

of a QTL. Under each condition, the detection was performed for 100 random replicates.

Parameters estimates and their standard error are reported. The empirical thresholds were

computed from the analysis of 1000 replicated data sets. In the special case of the estimation

of the QTL position, we measured a confidence interval (CI) based on the LOD-Drop-Off

19

(LDO) method (LANDER and BOTSTEIN 1989) calculated for each significant replicate. With

this interval, we calculated the frequency at which the true QTL location was included in the

LOD- Drop-Off CI. We also measured the power that the true QTL position was included in

the CI determined by 4 times the standard error of the estimated position. Finally, for the

significant replicates and at the true QTL position, we calculated the bias on the QTL

heritability induced by our method, for both formulae. We then computed the QTL

heritability over an interval including the true QTL position.


The average likelihood ratio test profiles over 100 replicates for the different settings explored

are presented in Figure 3. As expected, we notice a strong influence of the magnitude of the

QTL effect (i.e. the heritability of the QTL) on the LR profile (Figure 3a). Formula (2) -

which takes into account ancestor pedigree relationships as estimated by markers, to infer the

IBD probabilities - highly outperforms formula (1) for the three levels of QTL heritabilities in

terms of detection power. For subsequent simulations, we used formula (2) only.

The method seems robust to biased information on markers (Figure 3b). Indeed, likelihood

ratio test profiles under conditions supposed to be found in real breeding schemes (i.e. non-

informative alleles or wrongly informative, biased map estimation) shows only small

differences between the different tested conditions, except for the 10% missing marker

information on the last generation where the LR profile lies below the profile obtained with

complete marker information. Alike In State alleles that come from different founder lines –

hence linked to different, non-IBD QTL alleles - have only a small influence on the LR curve.

Figures 3c and 3d show the variations around the standard setting, for the number of founders,

number of breeding cycles, total number of F6 and marker density. For a constant mean size

20

of the derived populations (10 F6 lines per parent), switching from 500 to 250 F6 generates a

strong decrease in detection power (see graph 3c, lower curve). We also notice on this figure a

very low influence of the number of breeding cycles (NG=20) and a strong effect of the

marker informativeness represented by the number of founder lines (NP=40 against 20).

Finally, we notice a low influence of the marker density, except on the smoothness of the

curve which presents informativeness peaks close to the marker positions (Figure 3d).

Nevertheless, with 25% of AIS alleles, the QTL detection power was found more sensitive to

a decrease in marker density from a value of one marker every 20 cM downwards. With a

marker every 10 cM, the decrease in detection power is not very large (see Figure 3b).

Table 1 around here

The ability of the method to accurately estimate the parameters of interest can be judged from

the results presented in Table 1. The accuracy of the QTL estimated position increases with

the switch from formula (1) to formula (2) and also, as expected, with higher QTL

heritabilities. Nevertheless, formula (2) leads us to overestimate both QTL and total genetic

heritabilities, more than formula (1) does.

For the different designs explored, the smaller population size is the parameter which affects

the precision of the QTL position estimate most. On the opposite, the higher number of

founder lines at the beginning of crosses gives the best estimates, as it increases the chance to

have polymorphic markers between the parents. The influence of the number of breeding

generations on the accuracy of the parameter estimates also turned out to be small.

Finally, we notice that the 20 cM density case yields accurate results even if the number of

markers available to build the relationship matrix and infer the IBD is divided by two.

Nevertheless, both heritability estimates are less accurate than with a 10 cM density map.

21

Table 2 around here

The empirical threshold values of LR test statistics over 1000 replicated simulations are

reported in Table 2. For all the designs, the critical values are nearly equivalent. This is not

really surprising as the number of parameters being tested in the random model strategy

remains the same. We also report in Table 2 the average LR test statistics and the power

estimates for a Type I error α=0.05 over 100 replicated simulations. First we notice that the

value of the LR test statistic increased with the value of the QTL heritability and that the

chance to detect QTL is increased by using formula (2).

Biased marker information (non-IBD but AIS marker alleles, missing genotypes) does not

really influence the detection, except for 10% of missing data where the loss in power reaches

14% in comparison to the standard setting. For the different experimental designs, a higher

number of founder lines tends to increase the detection power. Finally, the simulations where

d=20cM and NF6=250 give correct detection powers in comparison to the standard setting.

Table 3 around here

In Table 3, we report the size of the confidence interval determined by four standard

deviations of the estimated position or the drop by one and by two LOD units. These CI have

been established by only taking the significant runs into account. We also report the

frequency at which the true position (i.e. 45 cM) is included in the delimited interval

(percentage of inclusion).

We notice that the drop by two LOD units gives the confidence interval that is closest to 95%

for the percentage of inclusion. In most cases, this interval is larger or equal to the one

delimited by four standard deviations. Nevertheless, as the LOD drop-off method does not

22

give symmetric intervals, it takes into account the information of the curve in a better way.

We also notice that the drop off by one LOD unit gives appropriate confidence intervals only

for the highest QTL heritability.

Table 4, 5, 6 around here

Table 4 shows, for the different levels of QTL heritabilities, the parameter estimates for the

replicates that are significant under the empirical threshold. We notice that, in comparison to

Table 1 which includes all the replicates, the overestimation of the QTL heritability is higher.

However, the estimated position is more accurate when averaging over the significant

replicates only, yielding a smaller standard deviation also.

In Table 5, we present the QTL and total genetic heritabilities at the true QTL position to

detect possible bias from the method under formula (1) and (2). We notice that formula (2)

gives more correct estimates at the true QTL position than formula (1) for all levels of QTL

heritabilities. As formula (2) gives almost unbiased estimates at the true QTL position, a

corrective factor for the heritability could then be worked out as a function of the estimated

position. We thus report in Table 6 an attempt to give more appropriate estimate of the

heritability for this kind of methods. We built an “Averaged Confidence Interval Heritability”

by averaging the heritability over the confidence interval around the position estimate (which

is supposed to include the true QTL position), instead of taking only a point heritability

estimate at the detected position. We notice that this corrective factor yields results very close

to the true parameter value with formula (2) but underestimates both heritabilities with

formula (1).

23

DISCUSSION

Obviously, many statistical methods already exist to map QTL in inbred plant material;

however, these methods mainly focus on a single bi-parental cross. Other methods have been

developed to address more challenging population structures (XIE et al. 1998; XU 1998; Yi

and XU 2001, for example). Nevertheless, these methods do not appear to be easily extended

to highly fragmented populations, at any selfed or back-crossed generation, and coming from

many different parents. They also do not take into account the possibility for alleles to be IBD

if ancestor pedigrees are not available.

In this study, we extended the QTL mapping methodology proposed by GEORGE et al. (2000)

and based on a two-step IBD variance component approach, to typical plant breeding

populations made up of selfed lines which may have either: one or two parents in common,

parents related to each other or not related to each other. In this paper we studied the simplest

possible scenario where no relevant information about the ancestors’ pedigrees was available.

The power and accuracy of this method were assessed using simulated data mimicking

conventional breeding programs in cereals, in an effort to reproduce actual conditions of

marker and gene frequencies and linkage disequilibrium across the parental lines.

In constructing the matrix of IBD probabilities, a more thorough use of the marker data was

achieved by calculating the genetic similarities between the parents. The extent to which the

matrix G was modified from formula (1) to formula (2) is quite large. The proportion of PIBD

values equal to zero with formula (1) – those values between non-sib lines – and replaced by

non-zero values, was equal to 91%, as calculated from the distribution of family sizes featured

in Figure 2. The inferred relatedness patterns between non-sibs leads to a substantial

improvement of the accuracy of the position estimates and of the QTL detection power.

Nevertheless, a downside of this improvement is a stronger overestimation of the QTL

heritability as we shall discuss below.

24

The method performed quite well for all the tested sources of bias (missing marker data, non-

IBD alike-in-state alleles). Two complementary explanations can be put forward to explain

the observed loss in statistical power in the settings with non-IBD AIS alleles, observed with

the lower marker density:

1- the lower chance to have informative markers flanking the interval being scanned.

Thus, informative flanking markers had to be fetched further apart on average, thus

decreasing in turn the estimates accuracy of the putative QTL’s allelic state.

2- A higher proportion of alleles that are AIS but not IBD should also have generated an

upward bias of some of most genetic similarity estimates between the parents. This, in

turn, will have affected the estimates of IBD probabilities and concomitantly the

additive relationships between individuals thereby generating an upward bias of their

estimates for non-full-sibs.

In our population design, there is a strong within family linkage disequilibrium that can be

exploited by comparing the parent’s genotypes to the current F6, which accumulated relatively

few cross-overs. Formula (1) is solely based on the utilization of this linkage disequilibrium,

and is similar to that used by XIE et al. (1998). Formula (2) can be viewed, loosely speaking,

as an attempt to merge, to some extent, several families together on the basis of the likelihood

that the parents share the same alleles identical-by-descent at the putative locus. The power

increase obtained by using formula (2) follows the same principle as that obtained by XIE et

al. (1998) in his Table 7 when he switched from a 100 x 5 sampling strategy to a less

fragmented 50 x 10.

The same comparison can be made to explain the increased detection power observed after 20

breeding cycles instead of 10. Due to the very uneven parental contribution to the crossing

scheme at each breeding cycle, some genetic drift takes place regularly during the breeding

25

cycles leading to the loss of certain haplotypes. Cross-overs increasingly occur between

chromosome blocks with similar haplotypes and are thus genetically ineffective. This could

be compared to a situation where a smaller number of “effective” parents were used, which,

for a constant progeny size, gives enhanced power (XIE et al., 1998). This explanation is

confirmed by the comparison between the use of formula (1) and formula (2) for the same 100

replicates, for the NG=20 setting. Formula (2) anticipates the similarities between the parents

in a better way, hence yielding better detection power.

QTL heritability readjustment before mapping

In preliminary simulations, QTL heritabilities changed, from the founder line generation to

that of the mapping F6 lines, due to random genetic drift and non-panmictic conditions in

small populations, hence yielding different heritabilities at the last generation. What we are

concerned with is the QTL heritability at the current mapping generation to allow sounder

comparison with published results. It therefore seemed sensible to choose that one, as

opposed to that at the founder generation, as an entry parameter for our study. It is also to

conciliate a germplasm history effect and a resulting QTL heritability set as a parameter, that

we did a systematic readjustment of this parameter at the end of the ten breeding cycles. In so

doing, we did not compromise too much with the assessment of the germplasm history effect

as far as allelic frequencies and the ratio of the QTL effect over the other polygenic effects are

concerned.

Overestimation of the QTL heritability and proposals for a corrected heritability

CHARCOSSET and GALLAIS (1996)’s conclusions about the overestimation of h2QTL by the R2

estimator, in the standard fixed-effect ANOVA framework, cannot be directly extended to our

case, where random effects are fitted to our QTLs. In fact, if the model is known, REML

estimates of σu2 and of σv2 are unbiased since the estimates of the fixed effects and the

prediction of the random effects are unbiased – the “U” of the BLUE and BLUP acronyms.

26

However, in QTL mapping, the model is unknown inasmuch as we do not know whether

there is a QTL or not, or if we assume that there is one, we do not know which locus must

have its segregation’s effect fitted in the model.

In this paper, we related the mean ĥ2QTL from all stochastic realisations, in addition to that of

the significant QTL’s only. This allowed us to distinguish between the part of overestimation

due to the Beavis effect (BEAVIS 1994) and other remaining sources of bias. There did remain

some. However, when, in our simulations, h2QTL was estimated at the QTL’s real position only

– which, again, one cannot do over real data - this bias disappeared with formula (2). This

suggests that :

1- the uncertainty over the QTL’s position generates a substantial bias in the QTL

heritability estimates by itself. Since the locus retained is the one that yields a model

with the maximum likelihood ratio (Lmax), it is also likely to be the locus where a

chance association with the residuals component of the phenotype is strongest. Thus,

in terms of expectancy, the residuals vector will, on average, play towards either

decreasing or increasing ĥ2QTL with the same frequency at the QTL’s real position,

whereas it will play towards increasing it more often at Lmax, since it is, by definition,

the locus most strongly associated with the phenotypes.

2- Since formula (2) recovers more of the real information, it was quite expected that

ĥ2QTL, in this case would approach h2QTL more closely and that formula (1) would give

an under-estimate, at the QTL’s real position.

We shall focus on formula (2) (which yields more accurate heritability estimates at the true

QTL position) to propose a correction of the (over)estimated heritability at the position of

maximum likelihood. At this stage, we focused on the significant replicates only to be

representative of what we would find with real data. We plotted the LR curve and that of the

27

putative QTL heritability estimate from many replicates along the scanned chromosome. The

highest LR at a certain position also corresponds to the highest detected QTL heritability as

exemplified by the fact that both curves showed the same pattern, i.e. the estimated QTL

heritability and the LR curve followed proportional values along the y-axis. This property can

be mathematically demonstrated.

For lower levels of QTL heritabilities, the position was poorly estimated for many replicates

(high standard deviation of the position estimates for the set of significant replicates), and the

QTL effects were overestimated. This was interpreted as being due to the Beavis (BEAVIS

1994) effect, because chance association between residuals and genotype can yield a

maximum for the QTL heritability away from the true QTL position.

On the contrary, when drawing the same curves for the replicates with the highest QTL

heritability (i.e. 0.2), we observed the same pattern for the curves (they still follow each other,

as expected), but the detected position was more often very close to the true QTL position

(yielding a smaller standard deviation on QTL position estimates for the set of significant

replicates). Thus, the variance component method applied to high QTL heritabilities did not

yield such a bias on QTL heritability estimates. This was interpreted as due to a lower effect

of the residual component, contrary to the case of lower heritabilities.

To summarize, we pointed out that:

(i) the estimated QTL heritability curve follows the LR curve.

(ii) formula (2) yields the correct QTL heritability estimate at the true QTL position,

(iii) the poor heritability estimates at the detected position are due to wrong position

estimates, and to the averaging of heritabilities, which were obtained at their own

different position estimates (hence not at the true one).

We worked out a sort of Confidence Interval for the estimated heritability, based on the

Confidence Interval for the QTL position estimate: bearing in mind that the expectancy of the

28

position where the estimated heritability is equal to the true one lies at the true QTL position.

Thus a pair of loci that delimits a confidence interval (hence a set of positions) that entails the

true QTL position will also delimit a corresponding set of estimated heritabilities – at the

different likely QTL positions within this interval - that entails the true one. The expectancy

of the heritabilities calculated in the middle of the interval will be higher than the real QTL

heritability by a certain factor that only depends on the real QTL heritability and on the

experimental design. On the other hand, towards the ends of the confidence interval, the

heritability of the QTL would, on average, be underestimated, by a factor that depends not

only on QTL heritability and experimental design but also on the chosen stringency of the

confidence interval. Hence, a less stringent confidence interval will contain a higher

proportion of underestimates. What we observed was that a 95% confidence interval (as

roughly determined by a 2-LOD drop-off interval) seems to contain just the right proportion

of under- and overestimates so that when we average the estimated heritabilities over the

confidence interval, we obtain an unbiased estimate.

Leads for improvement:

Taking all the markers to calculate the genetic similarities between the parents seems to be the

most appropriate solution in order to calculate the ji,π at each putative locus. As for the

polygenic term of the model, one may argue that the calculation of the matrix A (in v ~

(0,Aσv2)) could be more precise if it was based on the markers that are actually linked to some

polygenes, i.e. to some QTLs, instead of using all the markers indiscriminately as we did in

this study. Implementing this scheme would require a forward iterative search procedure

whereby the first step of QTL mapping would be carried out with no polygenic term. The

second step would add a polygenic term with A being equal to the arithmetic mean of the G

matrices of the different positions of the QTL detected in step one. New QTLs could then be

detected due to an increase in power, since some background genetic noise would have been

29

removed or at least, it is expected that the new QTL position estimates will be more precise.

Hence, in the subsequent round these new position estimates would be used to update A. And

the QTL search could be iterated until a convergence criterion is satisfied. This procedure is

somewhat analogous to one option of the Composite Interval Mapping proposed by ZENG

(1994) in which a best subset of markers was chosen by stepwise regression, then used as

cofactors in the linear model adjustment in the QTL search. This procedure, though, can bring

an advantage only if a few QTLs explain the genetic variation as opposed to many with a

small effect, all over the genome.

There is still some scope for a more accurate and probably less biased estimation of the

coefficients of co-ancestries between parents and individuals in order to better estimate the

parameters of the model and increase the QTL detection power. A first attempt to improve the

PIBD estimates could be to subtract from all genetic similarities an estimated proportion of

alleles in common that supposedly unrelated lines have in common – by definition, these

alleles in common would be alike in state only and not IBD. This method was suggested by

MELCHINGER (1991). The use of STRUCTURE (PRITCHARD et al. 2000, FALUSH et al. 2003 for

linked loci), for example, would allow to group parents according to a common selection

history. Our genetic similarities between the parents would be replaced by the scalar products

of parents’ decompositions between the inferred clusters. The remaining of the PIBD

calculation would be identical. Likewise, matrix A could be computed from the same

decomposition.

The proposed method has proved to be powerful in detecting medium sized QTL (h²=0.1) in a

typical set of inbred lines from a complex pedigree, such as those created in common cereal

breeding programs. The use of this kind of methods could increase the relevance and cost

30

effectiveness of quantitative trait loci mapping in applied contexts and could provide an

alternative to the development of specifically designed recombinant population, by exploiting

the genetic variation actually used by plant breeders. JANSEN et al. (2003) proposed to use

parental haplotypes sharing to routinely map QTL in breeding populations. The use of

haplotypes is challenging in this kind of methods where only little information is available on

founders and on the relationships between parents. It is even more challenging with the

increasing use of SNPs versus micro-satellite markers. A mixing of the proposed IBD method

and the method proposed by JANSEN et al. (2003) would be a good solution for mapping QTL

and properly estimate the haplotype effects, at a lower marking cost. One would first detect

QTL within an IBD-based variance component framework at a low marker density then use a

higher marker density for some QTL of interest to build haplotypes and estimate their effects

within a fixed-effect framework. Such locally high density mapping could allow identifying

the haplotypes of minimum length that have the most promising effect. Besides, it would

directly provide markers to manipulate these haplotypes in breeding schemes.

The methodology developed in this article is currently applied to the analysis of real wheat

breeding data.

Acknowledgments: The authors are grateful to the Ministère de l'Economie, des Finances et

de l'Industrie for its financial support (ASG program n° 01 04 90 6058)

31

LITTERATURE CITED

ALMASY,L., and J. BLANGERO, 1998 Multipoint quantitative trait linkage analysis in general

pedigrees. Am. J. Hum. Genet. 62: 1198-1211.

BEAVIS W.D., 1994 The power and deceit of QTL experiments: lessons from comparative

QTL studies. American Seed Trade Association. 49th Annual Corn and Sorghum Research

Conference. Washington D.C.

BERNARDO, R., 1993 Estimation of coefficient of coancestry using molecular markers in

maize. Theor. Appl. Genet. 85: 1055-1062.

BINK, M. C. A. M., P. UIMARI, M. SILLANPÄÄ, L. JANSS, and R. JANSEN, 2002 Multiple QTL

mapping in related plant populations via a pedigree-analysis approach. Theor. Appl. Genet.

104: 751-762.

CHARCOSSET, A., and A. GALLAIS, 1996 Estimation of the contribution of quantitative trait

loci (QTL) to the variance of a quantitative trait by means of genetic markers. Theor. Appl.

Genet. 93: 1193-1201.

COCKERHAM, C. C., 1983 Covariances of relatives from self-fertilization. Crop Science 23:

1177-1180.

DONINI P., J. R. LAW, R. M. D. KOEBNER, J. C. REEVES and R. J. Cooke, 2000 Temporal

trends in the diversity of UK wheat. Theor Appl Genet 100: 912-917.

FALUSH D., M. STEPHENS and J. K. PRITCHARD, 2003 Inference of population structure using

multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567-

1587.

GEORGE A. W., P. M. VISSCHER and C. S. HALEY, 2000 Mapping quantitative trait in complex

32

pedigrees: a two-step variance component approach. Genetics 156: 2081-2092.

GILMOUR, A. R., B. R. CULLIS, S. J. WELHAM and R. THOMPSON, 1998 ASREML. Program

User Manual. Ed. Orange Agricultural Institute, New South Wales, Australia.

GIMELFARB, A., and R. LANDE, 1994 Simulation of marker-assisted selection in hybrid

populations. Genet. Res. 63: 39-47.

GIMELFARB, A., and R. LANDE, 1995 Marker-assisted selection and marker-QTL associations

in hybrid populations. Theor. Appl. Genet. 91: 522-528.

HALEY, C.S., and S. KNOTT, 1992 A simple regression method for mapping quantitative trait

loci in line crosses using flanking markers. Heredity 69: 315-324.

HARRIS, D. L., 1964 Genotypic covariances between inbred relatives. Genetics 50: 1319-

1348.

HOSPITAL, F., L. MOREAU, F. LACOUDRE, A. CHARCOSSET and A. GALLAIS, 1997 More on the

efficiency of marker-assisted selection. Theor. Appl. Genet. 95: 1181-1189.

JANSEN, R. C., 1993 Interval mapping of multiple quantitative trait loci. Genetics 135: 205-

211.

JANSEN R. C., J-L. JANNINK and W.D. BEAVIS, 2003 Mapping quantitative trait loci in plant

breeding populations: use of parental haplotype sharing. Crop Sci. 43: 829-834.

JIANG, C., and Z-B ZENG,, 1995 Multiple trait analysis of genetic mapping for quantitative

trait loci. Genetics 140: 1111-1127.

KEMPTHORNE, O., 1955 The correlation between relatives in inbred populations. Genetics 40:

681-691.

33

KOROL, A., Y. RONIN, and V. KIRZHNER, 1995 Interval mapping of quantitative trait loci

employing correlated trait complexes. Genetics 140: 1137-1147.

LANDE R., and R. THOMPSON, 1990 Efficiency of marker-assisted selection in the

improvement of quantitative traits. Genetics 124: 743-756.

LANDER, E., and D. BOTSTEIN, 1989 Mapping mendelian factors underlying quantitative traits

using RFLP linkage maps. Genetics 121: 185-199.

LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer

Associates, Sunderland, MA.

MALECOT G., 1948 Les mathématiques de l'hérédité, Ed. Masson et Cie, Paris.

MELCHINGER A. E., M. M. MESSMER, M. LEE, W. L. WOODMAN and K. R. LAMKEY, 1991

Diversity and relationships among U.S. maize inbreds revealed by restriction fragment length

polymorphisms. Crop Sci. 31: 669-678.

MOREAU L., S. LEMARIÉ, A. CHARCOSSET, and A. GALLAIS, 2000 Economic efficiency of

one cycle of marker-assisted selection. Crop Sci. 40: 329-337.

MURANTY, H., 1996 Power of tests for quantitative trait loci detection using full-sib families

in different schemes. Heredity 76: 156-165.

NEI M. and W. H. LI, 1979 Mathematical model for studying genetic variations in terms of

restriction endonucleases. Proc. Natl. Acad. Sci. 76: 5369-5373.

PRITCHARD J. K., M. STEPHENS and P. DONNELLY, 2000 Inference of population structure

using multilocus genotype data. Genetics 155: 945-959.

RODER, M. S. , K. WENDEHAKE, V. KORZUN, G. BREDEMEIJER, D. LABORIE, D. et al., 2002

34

Construction and analysis of a microsatellite-based database of European wheat varieties.

Theor. Appl. Genet. 106: 67-73.

SERVIN, B., C. DILLMANN, G. DECOUX and F. HOSPITAL, 2002 MDM a program to compute

fully informative genotype frequencies in complex breeding schemes. J. Hered. 93(3): 227-

228.

S-PLUS, 2000 S-PLUS guide to statistical and mathematical analyses. MathSoft,

Massachusetts Institute of Technology

XIE, C., D. D. G. GESSLER and S. XU, 1998 Combining different line crosses for mapping

quantitative trait loci using the identical by descent-based variance component method.

Genetics 149: 1139-1146.

XU, S. 1998 Mapping quantitative trait loci using multiple families of line crosses. Genetics

148: 517-524.

XU, S., and W. R. ATCHLEY, 1995 A random model approach to interval mapping of

quantitative trait loci. Genetics 141: 1189-1197.

YI, N., and S. XU, 2001 Bayesian mapping of quantitative trait loci under complicated mating

designs. Genetics 157: 1759-1771.

ZENG, Z-B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457-1468.

ZENG, Z-B., 1993 Theoritical basis of separation of multiple linked gene effects on mapping

quantitative trait loci. Proc. Natl. Acad. Sci. USA. 90: 10972-10976.

35

TABLE 1

Estimates of the position, QTL and total genetic heritabilities; h²QTL and h²g respectively

Experimental design h² g Position ĥ2QTL ĥ2g

(1) QTL heritability and formula

h²QTL=0.05

Formula (1)

Formula (2)

0.427 (0.068) 49.44 (25.42)

46.45 (21.84)

0.067 (0.035)

0.085 (0.041)

0.427 (0.108)

0.432 (0.107)

h²QTL=0.1 Formula (1)

Formula (2)

0.435 (0.051) 47.76 (20.88)

45.52 (17.72)

0.099 (0.042)

0.129 (0.056)

0.442 (0.081)

0.450 (0.083)


Formula (2)

0.456 (0.047) 44.91 (7.35)

45.73 (6.89)

0.192 (0.058)

0.228 (0.065)

0.451 (0.083)

0.467 (0.091)

(2) Different experimental designs

Standard setting 45.52 (17.72) 0.129 (0.056) 0.450 (0.083)

AIS 25% 47.15 (17.99) 0.135 (0.060) 0.445 (0.084)

NA 10%

0.435 (0.051)

48.55 (18.87) 0.131 (0.098) 0.434 (0.094)

NP=40 0.451 (0.067) 48.42 (13.75) 0.136 (0.053) 0.446 (0.108)

NG=20 formula (1) 0.413 (0.059) 48.00 (18.75) 0.091 (0.039) 0.430 (0.091)

NG=20 formula (2) 0.413 (0.059) 46.89 (15.96) 0.145 (0.049) 0.436 (0.094)

d=20cM

d=20, AIS 25%

0.433 (0.052)

0.433 (0.052)

52.27 (23.10)

52.87 (24.80)

0.149 (0.070)

0.147 (0.076)

0.395 (0.088)

0.399 (0.087)

Npar=50, NF6=250 0.403 (0.077) 49.86 (23.25) 0.159 (0.070) 0.441 (0.149)

The standard setting is a QTL heritability of 0.1, a total genetic heritability (h2g) of 0.435

(with standard deviation 0.051 among the 100 repetitions) and a 10 cM marker density, with a

QTL at 45 cM. About 500 individuals are created at the F6 stage coming from 100 parents for

each breeding cycle. h2g is the result of the readjustment of the residuals in order to obtained

a given target h2QTL. The studied generation comes from the 10th cycle of breeding, having

considered 20 founder lines at the beginning of selection.

36

(1) Each set of runs differs from the simulated QTL heritability noted in column one, and

by the IBD formula tested. IBD formula (1) takes only into account Half-Sib and Full-

Sib relationships while formula (2) adds ancestor relationships estimated by markers.

(2) Each additional set of runs differ from the standard setting by the simulation

parameter noted in column one. Except the NG=20 case, all the other settings are

tested with formula (2) only.

Mean and standard deviations (in parentheses) are calculated among the 100 replicates.

37

TABLE 2

LR threshold, test statistic and QTL detection power

Threshold Test statistic Power (%)

(1) QTL heritability and formula

h²QTL=0,05 formula (1) 4.08 4.62 (3.52) 47

h²QTL=0,05 formula (2) 3.96 5.23 (3.97) 58

h²QTL=0,1 formula (1) 4.08 7.75 (5.06) 71

h²QTL=0,1 formula (2) 3.96 9.76 (6.71) 80

h²QTL=0,2 formula (1) 4.08 22.03 (10.9) 100

h²QTL=0,2 formula (2) 3.96 23.69 (10.8) 100


Standard setting 3.96 9.76 (6.71) 80

AIS 25% 4.28 9.91 (6.56) 72

NA 10% 4.30 7.81 (5.53) 66

NP=40 3.70 11.29 (7.09) 89

NG=20 formula (1) 3.72 5.95 (4.15) 65

NG=20 formula (2) 3.62 10.62 (5.75) 91

d=20 4.25 9.21 (6.81) 75

d=20, AIS 25% 4.94 8.06 (6.50) 64

Npar=50, NF6=250 4.00 7.96 (3.66) 62

See Table 1 for the standard setting. Each additional set of runs differs from the standard

setting by the parameter change noted in column one. Threshold represents the empirical

threshold calculated for 1000 replicates. Test statistic is the mean and standard deviation of

the maximum of LR test for the 100 replicates. Power is the percentage of replicates with max

LR exceeding the empirical threshold.

38

TABLE 3

Accuracy of three methods used to infer Confidence Intervals (CI)

CI- 4 S.D. % 4 S.D. CI- 1 LDO % 1 LDO CI- 2 LDO % 2 LDO

(1) QTL heritabilities and IBD formula

h²QTL=0,05 formula (1) 86.40 91 41.78 79 86.66 97

h²QTL=0,05 formula (2) 72.20 93 41.18 79 85.43 97

h²QTL=0,1 formula (1) 63.28 90 29.10 82 74.75 99

h²QTL=0,1 formula (2) 63.40 93 31.04 86 66.62 96

h²QTL=0,2 formula (1) 29.40 96 17.26 90 34.33 97

h²QTL=0,2 formula (2) 26.00 98 15.21 91 28.21 97


Standard Setting 63.40 93 31.04 86 66.62 96

AIS 25% 63.84 92 26.4 80 62.9 97

NA 10% 63.24 91 28.48 80 67.6 95

NP=40 52.24 94 26.66 85 60.34 97

NG=20, formula (1) 67.60 91 43.95 72 82.80 96

NG=20, formula (2) 62.20 91 27.30 76 66.44 97

D=20 93.60 96 36.35 75 73.96 94

D=20, AIS 25% 94.56 95 35.08 67 75.61 91

Npar=50, NF6=250 88.16 89 45.61 79 82.82 95

See Table 1 for the standard setting. “CI- 4 S.D.” represents the size of the CI calculated as

four times the standard deviation on the estimated QTL position for the significant runs, and

“% 4 S.D.” represents the percentage of times that the true position is included within the

delimited interval. “CI- 1 LDO” and “CI- 2 LDO” represent the size of the CI as determined

by the drop of one and two LOD unit (multiply by 2*ln(10) to convert in LR units)

39

respectively. “% 1 LDO” and “% 2 LDO” represent the number of times that the true position

is included within the delimited interval.

40

TABLE 4

Estimates of the QTL parameters for different levels of QTL heritability under the

empirical threshold

Experimental design h² g Position ĥ2QTL ĥ2g

QTL heritability

h²QTL=0.05

Formula (1)

Formula (2)

0.427 (0.068) 47.55 (21.61)

45.98 (18.05)

0.092 (0.032)

0.110 (0.030)

0.429 (0.110)

0.437 (0.109)


Formula (2)

0.435 (0.051) 49.56 (15.82)

45.56 (15.85)

0.117 (0.035)

0.143 (0.051)

0.452 (0.081)

0.450 (0.08)


Formula (2)

0.456 (0.047) 44.91 (7.35)

45.49 (6.50)

0.192 (0.058)

0.230 (0.062)

0.451 (0.083)

0.471 (0.081)

Each set of runs differs by the simulated QTL heritability noted in column one, and by the

IBD formula tested. Mean and standard deviations (in parentheses) are reported only for the

significant replicates (noted in the “Power” column of Table 2) among 100.

41

TABLE 5

Estimates of the QTL and total genetic heritabilities at the true QTL position (45)

h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435 h²QTL =0.2, h2g=0.456

Formula ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g

(1)

(2)

0.036 (0.037)

0.051 (0.047)

0.426 (0.106)

0.430 (0.108)

0.068 (0.048)

0.094 (0.064)

0.442 (0.080)

0.447 (0.080)

0.191 (0.056)

0.225 (0.066)

0.455 (0.081)

0.471 (0.094)

Formula (1) and (2) are tested for three levels of QTL heritability at the true QTL position.

h2g is the result of the readjustment of the residuals in order to obtained a given target h2QTL.

42

TABLE 6

Averaged Confidence Interval Heritability for the significant replicates over the 2 Lod-

Drop-off units interval

heritabilities h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435 h²QTL =0.2, h2g=0.456

No-Selection ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g

(1)

(2)

0.049 (0.028)

0.054 (0.034)

0.405 (0.104)

0.418 (0.109)

0.072 (0.039)

0.100 (0.063)

0.434 (0.083)

0.438 (0.081)

0.158 (0.062)

0.194 (0.067)

0.438 (0.085)

0.459 (0.082)

Formula (1) and (2) are tested for three levels of QTL heritability.

h2g is the result of the readjustment of the residuals in order to obtained a given target h2QTL.

FIGURE 1

1 2 3 NP Q1 Q2 Q3 QNP 1 2 3 NP

P1 P2 P3 P4 (…) P99 P100

matrix of crosses

* * * *

P1 P2 P79 P80

Founder lines: full linkage disequilibrium marker-QTL

G0 generation

For i=10 breeding cycles

Gi generation 500 F6 lines

QTL Detection

1 to 4 progenies per cross + half-sib relationships

circular crosses

G10 generation

+ P81-P100 : overlapping

NP sub-populations

Creation of the QTL and polygene effects

44

0

10

20

30

40

50

60

70

80

P1 P6 P16

P26

P36

P46

P56

P66

P76

P86

P96

Parent name

Num

ber o

f pro

geni

es /

pare

nt

MeanStandard Deviation

FIGURE 2

45

FIGURE 3

46

FIGURE LEGENDS: FIGURE 1: Simulation of the breeding scheme

FIGURE 2: Mean and standard error for 1000 replicates created by the “matrix of crosses”

function for the half-sib family size under the standard setting (100 parents, 500 F6).

FIGURE 3: Comparison of the LR profiles for (a) different levels of QTL heritabilities for

formula (1) and (2), (b) different biased marker information for the standard setting under

formula (2), (c) difference in the breeding schemes compared to the standard setting, and (d) a

density of one marker every 20 cM, with 25% of AIS non-IBD alleles.

IBD- based QTL detection in multi-cross inbred designs: A...

Documents

Transcript of IBD- based QTL detection in multi-cross inbred designs: A...