IBD- based QTL detection in multi-cross inbred designs: A...

46
1 IBD- based QTL detection in multi-cross inbred designs: A case study of cereal breeding programs Sébastien Crepieux *,1 , Bertrand Servin , Claude Lebreton and Gilles Charmet * *UMR 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand Cedex 2, France, INRA UMR de Génétique Végétale, INRA/UPS/INAPG, 91 190 Gif sur Yvette, France Limagrain Agro-Industrie, site d’ULICE, av G. Gershwin, BP173, F-63204 Riom Cedex, France

Transcript of IBD- based QTL detection in multi-cross inbred designs: A...

  • 1

    IBD- based QTL detection in multi-cross inbred designs: A case study of

    cereal breeding programs

    Sébastien Crepieux*,1, Bertrand Servin† , Claude Lebreton‡ and Gilles Charmet*

    *UMR 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand Cedex 2, France,

    †INRA UMR de Génétique Végétale, INRA/UPS/INAPG, 91 190 Gif sur Yvette, France

    ‡Limagrain Agro-Industrie, site d’ULICE, av G. Gershwin, BP173, F-63204 Riom Cedex,

    France

  • 2

    IBD-based multi-cross QTL mapping

    1 : corresponding author : 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand

    Cedex 2, France. Email : [email protected]

    Phone : 00 33 4 73 62 43 09 , Fax: 00 33 4 73 62 44 53

    Key words: QTL detection, variance component, pedigree breeding, IBD, multi-cross

    ABSTRACT

    Mapping quantitative trait loci in plants is usually conducted using a population derived from

    a cross between two inbred lines. The power of such QTL detection and the parameter

    estimates highly depend on the choice of the two parental lines. Thus, the QTL detected in

    such populations only represent a small part of the genetic architecture of the trait. Besides,

    the effects of only two alleles are characterised, which is of limited interest to the breeder. On

    the other hand, common pedigree breeding material remains unexploited for QTL mapping. In

    this study, we extend QTL mapping methodology to a generalized framework, based on a

    two-step IBD variance component approach, applicable to any type of breeding population

    coming from inbred parents. The power and accuracy of this method were assessed on

    simulated data mimicking conventional breeding programs in cereals. This method can

    provide an alternative to the development of specifically designed recombinant population, by

    exploiting the genetic variation actually managed by plant breeders. The use of these detected

    QTL in assisting breeding would thus be facilitated.

  • 3

    INTRODUCTION

    The availability of molecular markers in the 1980’s has opened new scope for quantitative

    genetics and breeding. It was thus anticipated that the manipulation of loci underlying

    quantitative traits (QTL) would be as easily feasible as with mendelian factors. This

    perspective, however, has remained largely unreached, despite the large corpus of theoretical

    studies on marker assisted selection ( e.g. LANDE and THOMPSON 1990; GIMELFARB and

    LANDE 1994 , 1995 ; HOSPITAL et al. 1997). The main reason is probably the cost of markers

    and the relatively low improvement in selection efficiency that leads MAS to be generally

    much more expensive than conventional breeding (MOREAU et al. 2000). The other reason is

    that applied breeding programs and QTL research are often disconnected, i.e. carried out by

    different teams and using different plant material.

    Classically, QTL analyses are carried out on a few progenies from broad base crosses, coming

    from a small number of distantly related lines, often including wild relatives. Such analyses

    mostly involve bi-parental progenies such as back-crosses (BC), doubled haploids lines (DH),

    F2 or recombinant inbred lines (RILs). In the approaches based on this kind of plant material,

    the effect of an allele substitution at a candidate locus is tested. This is called the fixed model

    approach (XU and ATCHLEY 1995) since it considers a fixed number of distinct alleles (most

    often two) at each putative QTL. Statistical methods for the QTL analysis of bi-parental

    populations underwent successive improvements through the advent of Interval Mapping

    (LANDER and BOTSTEIN 1989) and its linearization (HALEY and KNOTT 1992), the Composite

    Interval Mapping (ZENG 1993, 1994 ; JANSEN 1993) and multiple trait QTL mapping (KOROL

    et al. 1995; JIANG and ZENG).

    On the other hand, the breeder’s material is far from the studied bi-parental populations.

    Breeders generally handle many small families from crosses between (often) highly related

    elite lines. Thus, the above described methods are poorly adapted. Moreover there are many

  • 4

    drawbacks for the breeder’s use of the QTL found on bi-parental populations. First, when

    only two parents are considered, some markers and potential QTLs are more likely to be

    monomorphic, even if parental lines are carefully selected for trait divergence. Since, by

    definition, QTL can only be found at polymorphic sites in the genome, the expected number

    of QTL detected with a bi-parental cross will be lower than that expected when analyzing

    several crosses at a time (assuming the total number of genotypes is not the limiting factor).

    The second drawback is that the QTL effect is estimated as a contrast between two alleles and

    in one genetic background only. Therefore, in that context, the improvement of a line by the

    introgression of a QTL allele in a completely new genetic background is rather unpredictable,

    because of possible epistatic interaction between QTL and genetic background. Finally, from

    an economic standpoint, the cost of creation of large single cross progenies and specific trials

    for trait evaluation to perform QTL detection is quite high and often at the expense of other

    selection programs.

    All these drawbacks reduce the breeders’ interest for implementing such experimental designs

    when funding and work are constrained. Bi-parental crosses are usually preferred for more

    upstream studies, e.g. genomics: the fine-mapping of a QTL, which is a pre-requisite for its

    positional cloning, is easier when fewer QTLs are segregating. In contrast, the breeders’ focus

    will be to characterize the effect of a wide range of alleles in his germplasm. Methods for

    simultaneous detection and manipulation of QTL in breeding programs would thus enhance

    the applicability of MAS.

    In plant breeding, new methods for QTL detection in complex designs, close to those used in

    real breeding schemes have already been developed. MURANTY (1996) suggested to work in

    plants with progenies from several parents, in order to achieve a high probability to have more

    than one allele at a putative QTL, and also to have a more representative estimate of the

    variance accounted for by a QTL. She operated in a fixed-effect framework. Simulations

  • 5

    demonstrated that a higher QTL detection power was achieved, for a given sample size. XU

    (1998) compared the QTL detection powers obtained with random effect models and fixed

    effects and found similar values for individual family sizes as low as 25 individuals.

    However, in more unbalanced designs, the random effect approach was presumed to be more

    suited as it can handle any arbitrary pedigree of individuals (LYNCH and WALSH 1998, XU

    1998). Efficient methodologies for more fragmented populations in plants have been

    developed (XIE et al. 1998; YI and XU 2001, BINK et al. 2002, JANSEN et al. 2003 for

    example), but their extension or implementation for any complex plant designs, implying a

    mixing of half-sibs and full-sibs families of different sizes, at any generation of selfing and

    with the hermaphrodite status of parents, is not straightforward. The identical-by-descent

    (IBD)-based variance component analysis is a powerful statistical method for QTL mapping

    in complex populations and can be used in pedigrees of arbitrary size and complexity

    (ALMASY and BLANGERO 1998). These IBD-based variance component analyses are derived

    from the assumption that individuals of similar phenotype are more likely to share alleles that

    are identical by descent. The construction of IBD matrices for alleles at each tested position

    along the genome, and the fitting of random effect models (which assumes that QTL effects

    are normally distributed) offer an appropriate method to map QTL if the progeny population

    is large enough and if the progenies are connected in some way. Besides, these models do not

    need to assume a known, finite set of alleles at each putative QTL. Thus, they offer a less

    parameterized statistical environment in which to map QTL, because only the variances need

    to be estimated instead of every allele substitution effect. Generally, the IBD-based

    approaches assume a between family IBD-likelihood of zero (i.e. no parents in common

    between the two families), and thus, consider the parents as founders. However, this

    assumption is often wrong in common breeding pedigrees. Furthermore, in fragmented

    situations, i.e. where there are many families of small sizes (especially when the genotyping

  • 6

    takes place at a late stage in pedigree breeding, where we may easily end up with as few as

    one or two lines per cross), the IBD-likelihood matrix can be very sparse. Hence, there could

    be much to be gained in exploring the actual between family IBD-likelihoods.

    In the work related in this paper, we took over these developments and further assumed a non-

    zero IBD-likelihood between non-sib lines. We then present a unified IBD-based variance

    component analysis framework, to map QTL in any kind of multi-cross designs involving

    self-pollinating species, at any generation. To test the accuracy of the method, we developed a

    simulation program which mimics the steps of real breeding schemes. We chose the

    simulation parameters according to the information provided by breeders on their real

    material in order to be as comprehensive as possible in the range of genetic configurations

    explored.

    This method can provide an alternative to the development of specifically designed

    recombinant population, by exploiting the genetic variation actually managed by plant

    breeders. Our method can thus provide breeders with valuable information about the breeding

    values of their material and help them to design selection strategies.

    METHODS

    Two-step IBD based variance component method

    The method used to map QTL in a complex inbred pedigree is a two step variance component

    method, as described in GEORGE et al. (2000). Hence, this method first consists in

    constructing the (co)variance matrix of fixed and random effects at each putative QTL

    position and then estimates the likelihood of the presence of a QTL at these positions using

    appropriate linear models.

  • 7

    These two steps are common to all interval mapping based variance components methods.

    What differs mainly among all the published methods is the way to calculate the IBD

    probabilities (see GEORGE et al. 2000 for a review of IBD probabilities calculation). We

    adopted a deterministic approach to infer IBD probabilities for any generation of

    recombination and breeding scheme (e.g. F2, Fn, RIL, BCn), based on the MDM program

    (SERVIN et al. 2002).

    Mixed linear models

    We assume that the quantitative trait is a linear combination of fixed design effects, a putative

    QTL effect (with additive or/and dominance effect) and additive polygenic effects. The

    random polygenic effect is seen as the cumulative effect of all loci affecting the quantitative

    trait that are unlinked to the QTL. The model is:

    )1(eZvZuXy +++= β

    where y is an (m*1) vector of phenotypes, X is an (m*s) design matrix, β is a (s*1) vector of

    fixed effects, Z is an (m*q) incidence matrix relating records to individuals, u is a (q*1) vector

    of additive QTL effects, v is a (q*1) vector of additive polygenic effects and e is the residual.

    We assume that the random effects u, v and e are uncorrelated and distributed as multivariate

    normal densities: ),0(~;),0(~;),0(~ 222 evu INeANvGNu σσσ , with 222 , evu and σσσ being

    respectively the additive variance of the QTL, the polygenic variance and the residual

    variance. A is the (q*q) additive genetic relationship matrix; G is the (q*q) (co)variance

    matrix for the QTL additive effects conditional on marker information; and I is the (m*m)

    identity matrix.

    The model without QTL segregating in the population is, with the same notations:

    )2(eZvXy ++= β

  • 8

    Computation of the IBD probabilities

    We consider a mapping population composed of several sub-populations of small size. Each

    of these sub-populations is an offspring coming from an inbred pedigree started with two

    parents. For example, these sub-populations could be produced by several consecutive

    selfings (e.g. RILs) or back-crossings. We want to compute the probability that two

    individuals taken from any of these sub-populations share IBD alleles at a given locus of their

    genome. If we consider a pair of individuals from the mapping population, they may be (i)

    taken from the same sub-population, in which case they are full-sibs (ii) taken from two

    different sub-populations. In this last case, if one of the parents is common to the two sub-

    populations, the two individuals will be half-sibs, if the parents of the two sub-populations are

    distinct, the two individuals are considered as unrelated. We will now draw relevant

    calculations of the IBD probabilities for each of these cases.

    IBD value between two sibs at a QTL: Within each full-sib family of the breeding scheme,

    only two alleles are segregating giving only three possible genotypes at the QTL: QQ, Qq and

    qq.

    Following XIE et al. (1998) notations, the IBD value of two individuals i and j, is measured

    as:

    QqQqorQqqqqqqqor

    qqQQQqQQQQQQ

    forforfor

    jiji −−−

    −−−

    == ,

    ,,,

    012

    2 ,, θπ

    where ji,π are the ijth elements of G, and ji,θ is MALECOT’s (1948) coefficient of coancestry.

    As pointed out by many authors, when inbreeding is present, ji,π is not interpreted as the

    proportion of alleles IBD, but rather as twice the coefficient of coancestry (KEMPTHORNE

    1955; HARRIS 1964; COCKERHAM 1983).

  • 9

    Inferring the IBD value of a QTL from markers: The IBD value is completely determined

    by the genotypes of two individuals at the QTL of interest. The actual QTL genotype of an

    individual, however, is not observable, and must be inferred from flanking marker

    information. We denote the following probabilities:

    )/Pr(),/Pr(),/Pr( 012 MjMjMj IqqpandIQqpIQQp === .

    We write [ ] Tiiii pppp 012= and [ ]Tjjjj pppp 012= . The conditional expectations of the

    IBD values between two full sibs are:

    jTiMjiji CppIE == )/( ,, ππ for between individuals,

    and jT

    Mjiji pcIE == )/( ,, ππ for the individual with itself, where:

    =

    =

    121

    210111012

    candC

    In the rest of the paper, this formula, for computing IBD probabilities, will be referred to as

    formula (1).

    General case: Above, we emphasized the calculation of IBD for the full-sib case. The

    generalization to the half-sib case is trivial.

    Using this first formula to compute IBD probabilities, we assumed that parents of sub-

    populations were unrelated, i.e. they did not share any common ancestors. However, in

    practice, these parents were coming from previous generations of breeding and were very

    likely to share IBD alleles in their genome (due to the intensive use of some “star varieties”,

    for example). In order to take these possible relationships between parents into account, we

    estimated coancestry coefficients between them, using molecular information, as suggested by

    BERNARDO (1993). The pedigree information, if available, could be used to that end.

    However, the selection pressure may generate a discrepancy in the predicted proportion of

    parental genomes shared by the current lines. Besides, we assumed that the pedigree

  • 10

    information was often very scarce or even unavailable. Hence, we resorted to a genetic

    similarity based index to estimate that proportion of genome.

    First, we generalized the C matrix to the known half-sib, full-sib and unrelated individuals by

    introducing the coancestries between parents, estimated by markers in our case. We

    considered two individuals taken in two sub-populations. Let P1 and P2 be the parents of the

    first sub-population and P3 and P4, the parents of the other sub-population. We denote by

    GSP1P3, GSP1P4, GSP2P3 and GSP2P4 the estimates of the coefficients of coancestry between

    these parents. Taking into account possible coancestries between P1 and P3 on one hand and

    P2 and P4 on the other hand, the C matrix can then be re-written as:

    +=

    4242

    42423121

    31

    3131

    1

    20)(

    02

    PPPP

    PPPPPPPP

    PPPP

    GSGSGSGSGSGS

    GSGSC

    Note that in the full-sib case P1=P3 and P2=P4, so that GSP1P3 and GSP2P4 are equal to one

    and the C1 matrix is similar to C. Similarly, the relevant C matrices for half-sibs individual or

    for unrelated individuals can be obtained by replacing respectively GSP1P3 by one and GSP2P4

    by zero on one hand, and both GSP1P3 and GSP2P4 by zero on the other hand.

    Similarly, taking into account the coefficients between the parents P1 and P4 on one hand and

    P2 and P3 on the other hand, we can re-write the C matrix as :

    +=

    02)(21

    20

    4141

    32413241

    3232

    2

    PPPP

    PPPPPPPP

    PPPP

    GSGSGSGSGSGSGSGS

    C

    Finally, we can draw a general formula for the conditional expectation of the IBD values

    between two individuals coming from four (distinct or not) inbred parents:

    jTij

    TiMjiji pCppCpIE 21,, )/( +== ππ

  • 11

    ))])([()])([(

    )])([()])([((2)/(..

    121

    0121

    042121

    2121

    032

    121

    0121

    241121

    2121

    231,,

    iijjPPiijjPP

    iijjPPiijjPPMjiji

    ppppGSppppGS

    ppppGSppppGSIEei

    ++++++

    +++++== ππ

    The conditional expectation of the IBD for an individual with itself remains:

    12,, 2)/( jjMjiji ppIE +== ππ

    In the rest of the paper, this formula, using the C1 and C2 matrices, will be referred to as

    formula (2).

    The elements in the additive relationship matrix A are estimates of the proportion of IBD

    genome between any two lines. The ai,j elements is the proportion of IBD genome between

    line i and line j, based on genetic similarities.

    For both A and G matrices, genetic similarities (GSi,j) were computed using NEI and LI

    (1979) formula.

    Implementation of the IBD formula

    We used the deterministic approach of the MDM program (SERVIN et al. 2002) to compute all

    the pi and pj probabilities. IBD-likelihoods were computed every 3cM. Two flanking markers

    were used to infer the genotypes probabilities. In the frequent case where the two parents

    shared the same marker alleles at one or two loci flanking the putative QTL position, the next

    closest markers to the interval were used. It can easily be demonstrated that the IBD

    probabilities calculated at a putative QTL will be more precise if the flanking markers are

    highly polymorphic. Issues on the use of better estimates of the coefficients of co-ancestries

    will be discussed on the last part of the article.

    All the G matrices were then inverted, and written in ASREML (GILMOUR et al. 1998) format

    for user-defined inverse (co)variance matrices. We also computed the additive relationship

    matrix A. Then, it was inverted and written in ASREML format (end of step1).

    In step 2, ASREML provided restricted maximum-likelihood (REML) estimates of (1) and

    (2). To test for the presence of a QTL against no QTL at a particular chromosomal position,

  • 12

    we used the Log Likelihood Ratio test: LR= -2ln( L0(H0, no QTL present) – L1(H1, QTL

    present) ), where L1 and L0 represent the likelihood values of (1) and (2) evaluated at the

    REML solutions, respectively.

    Test statistic under the null hypothesis

    The choice of the threshold in this kind of population is always challenging. Many

    publications (ZENG 1994; XU and ATCHLEY 1995 for example) report that when a

    chromosomal interval is being scanned, the empirical distribution of LR follows a mixture of

    two Chi-square distributions, with one and two degrees of freedom, respectively.

    Since this article deals with simulated data, it is possible to replicate data under the null

    hypothesis of no QTL segregating, construct the empirical distribution of LR and derive

    empirical threshold by choosing the 95th percentile of the highest test statistic, generally over

    500 or 1000 stochastic realizations. In this paper, we calculated an empirical threshold for

    every set of parameters to see whether significant differences on threshold appeared or not.

    Hence, for each set of parameters tested, we ran 1000 additional simulations with no QTL

    segregating. We increased the polygenic variance such that the total genetic variance

    remained unchanged. We thus determined the empirical threshold by choosing the 95th

    percentile from the list of 1000 runs.

    A SIMULATION STUDY: The case of a cereal breeding program

    We chose the cereal example in the simulation study as it contains most of the difficulties

    generally encountered in inbred breeding schemes:

    - the frequent unavailability of reliable pedigree information, beyond the parents (and

    thus unavailability of ancestor lines to genotyping)

    - the potential to genotype only advanced generations of selfing, when the number of

    lines has been narrowed down and the trial precision of trials increased, constraining

  • 13

    to compute IBD at the F6 or F7 stage without any marker information between the

    initial cross and the resulting progenies

    - the very high number of parents of the mapping population yielding very small full-sib

    families, and an uneven (L shaped) distribution of half-sib family sizes

    Simulation of the inbred breeding scheme

    Every year, new plant breeding programs are started. The choice of the numerous parents for

    crosses combines on one hand lines with the highest genetic merit for some traits of interest

    like yield or quality, and on the other hand, some lines of specific interest such as special

    quality or pest/disease resistance, sometimes taken in old or exotic material. The original

    crossing scheme in a breeding program is very dependent on the breeder for the choice of the

    parents, but the breeding process is often closely the same.

    To reproduce the steps of the breeding programs, an S-PLUS (2000) function was developed.

    This function was designed by analysing wheat breeders information and allele diversity and

    frequencies made available by recent studies (DONINI et al. 2000; RODER et al. 2002).

    All simulations follow a similar procedure: (Figure 1)

    Figure 1 around here

    Marker-genotypes construction: Before the start of the breeding process, we considered only

    a few founder lines, with full linkage disequilibrium across all their genome, hence between

    all the markers and the QTLs. We also imposed that at this generation, no founder lines had

    any alleles in common with any other. Thus, line 1 carried only markers and QTLs coded 1,

    line 2 only markers and QTLs coded 2, and the last founder line (the NPth) carried only

    markers and QTLs coded NP all along its genome. The genome of simulated individuals was

    composed of 21 chromosomes of 100 cM, with markers evenly spaced every d centiMorgans

  • 14

    all along the chromosomes, with two markers on each chromosome telomeres. On

    chromosome 1, we simulated a single QTL between two markers. In addition to the QTL of

    interest, we simulated on the other chromosomes Npoly=40 randomly located QTLs (that

    could therefore be linked or not) with random effects to simulate the polygenic contribution to

    the trait values.

    First generation of crosses: From the founder lines generation, circular crosses, i.e. 1*2, 2*3,

    … , NP-1*NP, NP*1 were performed. We then derived lines by self-pollination during five

    generations in order to obtain F6 lines. NP mixed sub-populations of the same size (same

    contribution for all founders for this first generation) were derived, giving the “G0”

    generation. The population was considered as totally fixed, by sampling only one gamete after

    the last self-pollination. This stage corresponded to the end of a breeding cycle. The

    recombination procedure was based on randomly placed chiasmatas with no interference.

    Quantitative trait: The parameters for the creation of the quantitative trait are the QTL

    heritability (h²QTL) and the heritability of the polygenes (h²poly).

    We created at the founder line generation the NP allele effects for the QTL and the Npoly

    polygenes. The NP=20 possible effects of the QTL were drawn from a normal distribution

    with mean 0 and variance 1. Then the QTL variance (VarQTL) was calculated at the true QTL

    position, and the NP=20 effects for each of the Npoly polygenes were extracted from a normal

    law with mean 0 and variance [VarQTL*(h²poly/h²QTL)]/Npoly]. Finally, the true variance

    accounted by the polygenes was computed (VarPoly), and a random normally distributed noise

    with variance 2eσ = [VarQTL *(1/h²QTL-1) - VarPoly ] was added to simulate phenotypic values

    of the trait. Thus, the ratio of the additive variance explained by the QTL on the total

    phenotypic variance is exactly equal to the specified value h²QTL while the ratio of polygenic

    QTL on the total phenotypic variance could be slightly different from the specified h²poly.

    Hence, the allele effects and the environmental variance were created at the first generation,

  • 15

    and remained constant for all the generations even if the number of alleles decreased.

    Nevertheless, the environmental variance was adjusted at the last generation to set the desired

    QTL heritability before performing the QTL detection.

    Overlapping generations and matrix of crosses: When the genotype and phenotype of the

    lines were obtained, virtual breeding schemes have been performed. Two hypotheses were

    implemented by extrapolating information obtained by breeders:

    - The “overlapping” choice of the parents. All the parents were not necessarily extracted from

    the last generation only, but a proportion of them (parameter) could originate from older ones

    - The influence of a matrix of crosses on the structure of the resulting progeny of a cross

    breeding program, which really influenced the effective population size

    The design of crosses at the beginning of a breeding scheme could be seen as a geometric

    series, since the representation of parents in the selected progeny is uneven, L-shaped rather

    than random. For example, if a given line, say X, is the most cultivated line at a given period

    (with the best agronomic performance in a range of environments), X will usually be crossed

    to many other lines to fully exploit its genetic value. After self-pollination and selection, a

    certain number of lines coming from this parent X will still remain at the F6 stage, and will

    form one of the largest half-sib families of all the breeding scheme (containing possibly some

    full-sibs when a specific cross is particularly outstanding). On the other hand, an exotic plant

    with a really focused interest but with low agronomic performance may also be used to

    initiate crosses, but at a smaller scale. Some of its offspring will also probably be selected but

    at a much smaller extent.

    A matrix of crosses was implemented to reproduce the formation of half-sib and full-sib

    families during a cycle of breeding, hence taking into account unbalanced contributions of

    parents to the final population. This matrix was filled only upper diagonal, with the parents

    sorted from the best ones to the exotic ones. The choice and order of the parents to fill the

  • 16

    matrix were based on their anteriority. We considered that the best agronomic ones came from

    the closer generations of breeding, and that the exotic ones came from older generations of

    breeding. The “overlapping” option extracted 80% of parents from the closest generation of

    breeding, and 20% from all the older generations (accounting for 10% of the resulting

    progeny). On average, 285 crosses were performed per simulated selection cycle (out of 4750

    possible crosses in the full matrix of crosses). Note that each cross gave, on average, 1.75 full-

    sibs and that each parent was found, on average, in ten progenies. Thus, each individual is

    related, on average, to 8.25 half-sibs and related to 0.75 full-sib.

    The resulting half-sib families sizes are presented in figure 2.

    Figure 2 around here

    Performing n breeding cycles: After the first generation, a loop on the number of breeding

    cycles was performed and the parents of crosses, the resulting progenies and the phenotypic

    data were stored. All available generations were used to build the next. The last breeding

    cycle, NG, was used for QTL detection.

    Note that at the beginning, all the allele frequencies were equal, which was not the case after

    many generations due to genetic drift and non-panmictic conditions. NP alleles with different

    effects were possible at each QTL locus and at each marker. All the markers and QTL were in

    full linkage disequilibrium at G0 but were not after many recombinations (5 generations of

    self-pollination by cycle, and NG cycles before the mapping generation).

    Thus, it is easy to anticipate that the information carried by markers and QTL will be different

    in many cases, and that ANOVA based method are likely to be unefficient.

    As the goal of this article is not to anticipate the influence of selection on the QTL detection,

    we did not performed recurrent pedigree selection on the value of the quantitative trait for the

  • 17

    choice of the parents, or intra-cross-selection during the self pollination process. We just

    wanted to anticipate the influence of the structure of the population and the effect of the

    design of crosses and breeding cycles for a non-selected trait, at non-selected locus.

    Design of simulations:

    Standard setting: as proposed by XU (1998), instead of performing simulations on a factorial

    design of every possible parameter combinations, we chose a standard setting for each

    parameter. The conditions of this standard setting were 21 chromosomes of length 100 cM

    each with 11 markers spaced every 10 cM, a QTL segregating at position 45 on chromosome

    1 with heritability 0.1. Npoly=40 polygenic QTL were randomly placed on the other

    chromosomes setting a total genetic heritability (h²g) of 0.435 (with standard deviation 0.051

    among the 100 repetitions). For each cycle, about 500 individuals were created at the F6 stage

    coming from 100 parents. The studied generation came from the 10th cycle of breeding,

    having considered NP=20 founder lines at the beginning of selection.

    Comparison of IBD formula (1) and IBD formula (2) for three levels of QTL heritability:

    Both formulae were tested for the same set of 100 independent replicates. Three different

    quantitative traits per replicate were created at the last generation of breeding scheme with

    QTL heritabilities 0.05, 0.1 and 0.2. Total genetic heritability (h²g) was initially set to 0.5 but

    after 10 generations of crosses (and genetic drift), some stochastic differences between

    replicates appeared. Thus, h²g values are 0.427 (0.063 standard deviation among 100

    replicates) for h²QTL=0.05, 0.435 (0.051) for h²QTL=0.1, and 0.456 (0.047) for h²QTL=0.2.

    Different experimental conditions: We first tested the robustness of formula (2) for different

    marker bias conditions. To avoid stochastic differences between the tested conditions, the 100

    same replicates from the standard setting were used before adding bias to the marker

    information. Thus, differences between the results within those conditions can be directly

  • 18

    attributed to this added bias on marker information. Created biases included (1) missing

    marker information (NA) (10% of randomly placed NA in the progenies), (2) portion of non-

    IBD Alike-In-State (AIS) alleles (a portion of allele codes were changed to other existing

    allele codes in parents and progeny files to obtain 25% of AIS (i.e. not IBD) alleles in the

    whole genome).

    We then varied out parameters around the standard setting, changing only one parameter at a

    time: these parameters include (1) the total number of founder lines in the base population

    (NP: 40 vs 20); (2) the number of breeding cycles (NG: 20 vs 10) ; (3) the marker density (d:

    20 vs 10) ; (4) the number of lines at the mapping generation (n: 250 vs 500) and number of

    parents (m: 50 vs 100)

    We also tested the 20 cM marker density condition (3) with 25% of AIS alleles in order to

    anticipate the effect of this added bias at a lower marker density. Formula (2) was used for the

    analysis of all these variations, except for the NG=20 case, where we tested both formulae.

    Some parameters remained constant for all the simulations: the total number of chromosomes

    (21), the number of polygenic QTL (40), the position of the QTL on chromosome 1 (45 cM

    except for the d=20cM case where the QTL position is 50 cM), the total genetic heritability

    originally set to 0.5.

    RESULTS

    We report the results of the two step variance component analysis on the different settings

    below. Due to computer constraints, only every third centiMorgan was tested for the presence

    of a QTL. Under each condition, the detection was performed for 100 random replicates.

    Parameters estimates and their standard error are reported. The empirical thresholds were

    computed from the analysis of 1000 replicated data sets. In the special case of the estimation

    of the QTL position, we measured a confidence interval (CI) based on the LOD-Drop-Off

  • 19

    (LDO) method (LANDER and BOTSTEIN 1989) calculated for each significant replicate. With

    this interval, we calculated the frequency at which the true QTL location was included in the

    LOD- Drop-Off CI. We also measured the power that the true QTL position was included in

    the CI determined by 4 times the standard error of the estimated position. Finally, for the

    significant replicates and at the true QTL position, we calculated the bias on the QTL

    heritability induced by our method, for both formulae. We then computed the QTL

    heritability over an interval including the true QTL position.

    Figure 3 around here

    The average likelihood ratio test profiles over 100 replicates for the different settings explored

    are presented in Figure 3. As expected, we notice a strong influence of the magnitude of the

    QTL effect (i.e. the heritability of the QTL) on the LR profile (Figure 3a). Formula (2) -

    which takes into account ancestor pedigree relationships as estimated by markers, to infer the

    IBD probabilities - highly outperforms formula (1) for the three levels of QTL heritabilities in

    terms of detection power. For subsequent simulations, we used formula (2) only.

    The method seems robust to biased information on markers (Figure 3b). Indeed, likelihood

    ratio test profiles under conditions supposed to be found in real breeding schemes (i.e. non-

    informative alleles or wrongly informative, biased map estimation) shows only small

    differences between the different tested conditions, except for the 10% missing marker

    information on the last generation where the LR profile lies below the profile obtained with

    complete marker information. Alike In State alleles that come from different founder lines –

    hence linked to different, non-IBD QTL alleles - have only a small influence on the LR curve.

    Figures 3c and 3d show the variations around the standard setting, for the number of founders,

    number of breeding cycles, total number of F6 and marker density. For a constant mean size

  • 20

    of the derived populations (10 F6 lines per parent), switching from 500 to 250 F6 generates a

    strong decrease in detection power (see graph 3c, lower curve). We also notice on this figure a

    very low influence of the number of breeding cycles (NG=20) and a strong effect of the

    marker informativeness represented by the number of founder lines (NP=40 against 20).

    Finally, we notice a low influence of the marker density, except on the smoothness of the

    curve which presents informativeness peaks close to the marker positions (Figure 3d).

    Nevertheless, with 25% of AIS alleles, the QTL detection power was found more sensitive to

    a decrease in marker density from a value of one marker every 20 cM downwards. With a

    marker every 10 cM, the decrease in detection power is not very large (see Figure 3b).

    Table 1 around here

    The ability of the method to accurately estimate the parameters of interest can be judged from

    the results presented in Table 1. The accuracy of the QTL estimated position increases with

    the switch from formula (1) to formula (2) and also, as expected, with higher QTL

    heritabilities. Nevertheless, formula (2) leads us to overestimate both QTL and total genetic

    heritabilities, more than formula (1) does.

    For the different designs explored, the smaller population size is the parameter which affects

    the precision of the QTL position estimate most. On the opposite, the higher number of

    founder lines at the beginning of crosses gives the best estimates, as it increases the chance to

    have polymorphic markers between the parents. The influence of the number of breeding

    generations on the accuracy of the parameter estimates also turned out to be small.

    Finally, we notice that the 20 cM density case yields accurate results even if the number of

    markers available to build the relationship matrix and infer the IBD is divided by two.

    Nevertheless, both heritability estimates are less accurate than with a 10 cM density map.

  • 21

    Table 2 around here

    The empirical threshold values of LR test statistics over 1000 replicated simulations are

    reported in Table 2. For all the designs, the critical values are nearly equivalent. This is not

    really surprising as the number of parameters being tested in the random model strategy

    remains the same. We also report in Table 2 the average LR test statistics and the power

    estimates for a Type I error α=0.05 over 100 replicated simulations. First we notice that the

    value of the LR test statistic increased with the value of the QTL heritability and that the

    chance to detect QTL is increased by using formula (2).

    Biased marker information (non-IBD but AIS marker alleles, missing genotypes) does not

    really influence the detection, except for 10% of missing data where the loss in power reaches

    14% in comparison to the standard setting. For the different experimental designs, a higher

    number of founder lines tends to increase the detection power. Finally, the simulations where

    d=20cM and NF6=250 give correct detection powers in comparison to the standard setting.

    Table 3 around here

    In Table 3, we report the size of the confidence interval determined by four standard

    deviations of the estimated position or the drop by one and by two LOD units. These CI have

    been established by only taking the significant runs into account. We also report the

    frequency at which the true position (i.e. 45 cM) is included in the delimited interval

    (percentage of inclusion).

    We notice that the drop by two LOD units gives the confidence interval that is closest to 95%

    for the percentage of inclusion. In most cases, this interval is larger or equal to the one

    delimited by four standard deviations. Nevertheless, as the LOD drop-off method does not

  • 22

    give symmetric intervals, it takes into account the information of the curve in a better way.

    We also notice that the drop off by one LOD unit gives appropriate confidence intervals only

    for the highest QTL heritability.

    Table 4, 5, 6 around here

    Table 4 shows, for the different levels of QTL heritabilities, the parameter estimates for the

    replicates that are significant under the empirical threshold. We notice that, in comparison to

    Table 1 which includes all the replicates, the overestimation of the QTL heritability is higher.

    However, the estimated position is more accurate when averaging over the significant

    replicates only, yielding a smaller standard deviation also.

    In Table 5, we present the QTL and total genetic heritabilities at the true QTL position to

    detect possible bias from the method under formula (1) and (2). We notice that formula (2)

    gives more correct estimates at the true QTL position than formula (1) for all levels of QTL

    heritabilities. As formula (2) gives almost unbiased estimates at the true QTL position, a

    corrective factor for the heritability could then be worked out as a function of the estimated

    position. We thus report in Table 6 an attempt to give more appropriate estimate of the

    heritability for this kind of methods. We built an “Averaged Confidence Interval Heritability”

    by averaging the heritability over the confidence interval around the position estimate (which

    is supposed to include the true QTL position), instead of taking only a point heritability

    estimate at the detected position. We notice that this corrective factor yields results very close

    to the true parameter value with formula (2) but underestimates both heritabilities with

    formula (1).

  • 23

    DISCUSSION

    Obviously, many statistical methods already exist to map QTL in inbred plant material;

    however, these methods mainly focus on a single bi-parental cross. Other methods have been

    developed to address more challenging population structures (XIE et al. 1998; XU 1998; Yi

    and XU 2001, for example). Nevertheless, these methods do not appear to be easily extended

    to highly fragmented populations, at any selfed or back-crossed generation, and coming from

    many different parents. They also do not take into account the possibility for alleles to be IBD

    if ancestor pedigrees are not available.

    In this study, we extended the QTL mapping methodology proposed by GEORGE et al. (2000)

    and based on a two-step IBD variance component approach, to typical plant breeding

    populations made up of selfed lines which may have either: one or two parents in common,

    parents related to each other or not related to each other. In this paper we studied the simplest

    possible scenario where no relevant information about the ancestors’ pedigrees was available.

    The power and accuracy of this method were assessed using simulated data mimicking

    conventional breeding programs in cereals, in an effort to reproduce actual conditions of

    marker and gene frequencies and linkage disequilibrium across the parental lines.

    In constructing the matrix of IBD probabilities, a more thorough use of the marker data was

    achieved by calculating the genetic similarities between the parents. The extent to which the

    matrix G was modified from formula (1) to formula (2) is quite large. The proportion of PIBD

    values equal to zero with formula (1) – those values between non-sib lines – and replaced by

    non-zero values, was equal to 91%, as calculated from the distribution of family sizes featured

    in Figure 2. The inferred relatedness patterns between non-sibs leads to a substantial

    improvement of the accuracy of the position estimates and of the QTL detection power.

    Nevertheless, a downside of this improvement is a stronger overestimation of the QTL

    heritability as we shall discuss below.

  • 24

    The method performed quite well for all the tested sources of bias (missing marker data, non-

    IBD alike-in-state alleles). Two complementary explanations can be put forward to explain

    the observed loss in statistical power in the settings with non-IBD AIS alleles, observed with

    the lower marker density:

    1- the lower chance to have informative markers flanking the interval being scanned.

    Thus, informative flanking markers had to be fetched further apart on average, thus

    decreasing in turn the estimates accuracy of the putative QTL’s allelic state.

    2- A higher proportion of alleles that are AIS but not IBD should also have generated an

    upward bias of some of most genetic similarity estimates between the parents. This, in

    turn, will have affected the estimates of IBD probabilities and concomitantly the

    additive relationships between individuals thereby generating an upward bias of their

    estimates for non-full-sibs.

    In our population design, there is a strong within family linkage disequilibrium that can be

    exploited by comparing the parent’s genotypes to the current F6, which accumulated relatively

    few cross-overs. Formula (1) is solely based on the utilization of this linkage disequilibrium,

    and is similar to that used by XIE et al. (1998). Formula (2) can be viewed, loosely speaking,

    as an attempt to merge, to some extent, several families together on the basis of the likelihood

    that the parents share the same alleles identical-by-descent at the putative locus. The power

    increase obtained by using formula (2) follows the same principle as that obtained by XIE et

    al. (1998) in his Table 7 when he switched from a 100 x 5 sampling strategy to a less

    fragmented 50 x 10.

    The same comparison can be made to explain the increased detection power observed after 20

    breeding cycles instead of 10. Due to the very uneven parental contribution to the crossing

    scheme at each breeding cycle, some genetic drift takes place regularly during the breeding

  • 25

    cycles leading to the loss of certain haplotypes. Cross-overs increasingly occur between

    chromosome blocks with similar haplotypes and are thus genetically ineffective. This could

    be compared to a situation where a smaller number of “effective” parents were used, which,

    for a constant progeny size, gives enhanced power (XIE et al., 1998). This explanation is

    confirmed by the comparison between the use of formula (1) and formula (2) for the same 100

    replicates, for the NG=20 setting. Formula (2) anticipates the similarities between the parents

    in a better way, hence yielding better detection power.

    QTL heritability readjustment before mapping

    In preliminary simulations, QTL heritabilities changed, from the founder line generation to

    that of the mapping F6 lines, due to random genetic drift and non-panmictic conditions in

    small populations, hence yielding different heritabilities at the last generation. What we are

    concerned with is the QTL heritability at the current mapping generation to allow sounder

    comparison with published results. It therefore seemed sensible to choose that one, as

    opposed to that at the founder generation, as an entry parameter for our study. It is also to

    conciliate a germplasm history effect and a resulting QTL heritability set as a parameter, that

    we did a systematic readjustment of this parameter at the end of the ten breeding cycles. In so

    doing, we did not compromise too much with the assessment of the germplasm history effect

    as far as allelic frequencies and the ratio of the QTL effect over the other polygenic effects are

    concerned.

    Overestimation of the QTL heritability and proposals for a corrected heritability

    CHARCOSSET and GALLAIS (1996)’s conclusions about the overestimation of h2QTL by the R2

    estimator, in the standard fixed-effect ANOVA framework, cannot be directly extended to our

    case, where random effects are fitted to our QTLs. In fact, if the model is known, REML

    estimates of σu2 and of σv2 are unbiased since the estimates of the fixed effects and the

    prediction of the random effects are unbiased – the “U” of the BLUE and BLUP acronyms.

  • 26

    However, in QTL mapping, the model is unknown inasmuch as we do not know whether

    there is a QTL or not, or if we assume that there is one, we do not know which locus must

    have its segregation’s effect fitted in the model.

    In this paper, we related the mean ĥ2QTL from all stochastic realisations, in addition to that of

    the significant QTL’s only. This allowed us to distinguish between the part of overestimation

    due to the Beavis effect (BEAVIS 1994) and other remaining sources of bias. There did remain

    some. However, when, in our simulations, h2QTL was estimated at the QTL’s real position only

    – which, again, one cannot do over real data - this bias disappeared with formula (2). This

    suggests that :

    1- the uncertainty over the QTL’s position generates a substantial bias in the QTL

    heritability estimates by itself. Since the locus retained is the one that yields a model

    with the maximum likelihood ratio (Lmax), it is also likely to be the locus where a

    chance association with the residuals component of the phenotype is strongest. Thus,

    in terms of expectancy, the residuals vector will, on average, play towards either

    decreasing or increasing ĥ2QTL with the same frequency at the QTL’s real position,

    whereas it will play towards increasing it more often at Lmax, since it is, by definition,

    the locus most strongly associated with the phenotypes.

    2- Since formula (2) recovers more of the real information, it was quite expected that

    ĥ2QTL, in this case would approach h2QTL more closely and that formula (1) would give

    an under-estimate, at the QTL’s real position.

    We shall focus on formula (2) (which yields more accurate heritability estimates at the true

    QTL position) to propose a correction of the (over)estimated heritability at the position of

    maximum likelihood. At this stage, we focused on the significant replicates only to be

    representative of what we would find with real data. We plotted the LR curve and that of the

  • 27

    putative QTL heritability estimate from many replicates along the scanned chromosome. The

    highest LR at a certain position also corresponds to the highest detected QTL heritability as

    exemplified by the fact that both curves showed the same pattern, i.e. the estimated QTL

    heritability and the LR curve followed proportional values along the y-axis. This property can

    be mathematically demonstrated.

    For lower levels of QTL heritabilities, the position was poorly estimated for many replicates

    (high standard deviation of the position estimates for the set of significant replicates), and the

    QTL effects were overestimated. This was interpreted as being due to the Beavis (BEAVIS

    1994) effect, because chance association between residuals and genotype can yield a

    maximum for the QTL heritability away from the true QTL position.

    On the contrary, when drawing the same curves for the replicates with the highest QTL

    heritability (i.e. 0.2), we observed the same pattern for the curves (they still follow each other,

    as expected), but the detected position was more often very close to the true QTL position

    (yielding a smaller standard deviation on QTL position estimates for the set of significant

    replicates). Thus, the variance component method applied to high QTL heritabilities did not

    yield such a bias on QTL heritability estimates. This was interpreted as due to a lower effect

    of the residual component, contrary to the case of lower heritabilities.

    To summarize, we pointed out that:

    (i) the estimated QTL heritability curve follows the LR curve.

    (ii) formula (2) yields the correct QTL heritability estimate at the true QTL position,

    (iii) the poor heritability estimates at the detected position are due to wrong position

    estimates, and to the averaging of heritabilities, which were obtained at their own

    different position estimates (hence not at the true one).

    We worked out a sort of Confidence Interval for the estimated heritability, based on the

    Confidence Interval for the QTL position estimate: bearing in mind that the expectancy of the

  • 28

    position where the estimated heritability is equal to the true one lies at the true QTL position.

    Thus a pair of loci that delimits a confidence interval (hence a set of positions) that entails the

    true QTL position will also delimit a corresponding set of estimated heritabilities – at the

    different likely QTL positions within this interval - that entails the true one. The expectancy

    of the heritabilities calculated in the middle of the interval will be higher than the real QTL

    heritability by a certain factor that only depends on the real QTL heritability and on the

    experimental design. On the other hand, towards the ends of the confidence interval, the

    heritability of the QTL would, on average, be underestimated, by a factor that depends not

    only on QTL heritability and experimental design but also on the chosen stringency of the

    confidence interval. Hence, a less stringent confidence interval will contain a higher

    proportion of underestimates. What we observed was that a 95% confidence interval (as

    roughly determined by a 2-LOD drop-off interval) seems to contain just the right proportion

    of under- and overestimates so that when we average the estimated heritabilities over the

    confidence interval, we obtain an unbiased estimate.

    Leads for improvement:

    Taking all the markers to calculate the genetic similarities between the parents seems to be the

    most appropriate solution in order to calculate the ji,π at each putative locus. As for the

    polygenic term of the model, one may argue that the calculation of the matrix A (in v ~

    (0,Aσv2)) could be more precise if it was based on the markers that are actually linked to some

    polygenes, i.e. to some QTLs, instead of using all the markers indiscriminately as we did in

    this study. Implementing this scheme would require a forward iterative search procedure

    whereby the first step of QTL mapping would be carried out with no polygenic term. The

    second step would add a polygenic term with A being equal to the arithmetic mean of the G

    matrices of the different positions of the QTL detected in step one. New QTLs could then be

    detected due to an increase in power, since some background genetic noise would have been

  • 29

    removed or at least, it is expected that the new QTL position estimates will be more precise.

    Hence, in the subsequent round these new position estimates would be used to update A. And

    the QTL search could be iterated until a convergence criterion is satisfied. This procedure is

    somewhat analogous to one option of the Composite Interval Mapping proposed by ZENG

    (1994) in which a best subset of markers was chosen by stepwise regression, then used as

    cofactors in the linear model adjustment in the QTL search. This procedure, though, can bring

    an advantage only if a few QTLs explain the genetic variation as opposed to many with a

    small effect, all over the genome.

    There is still some scope for a more accurate and probably less biased estimation of the

    coefficients of co-ancestries between parents and individuals in order to better estimate the

    parameters of the model and increase the QTL detection power. A first attempt to improve the

    PIBD estimates could be to subtract from all genetic similarities an estimated proportion of

    alleles in common that supposedly unrelated lines have in common – by definition, these

    alleles in common would be alike in state only and not IBD. This method was suggested by

    MELCHINGER (1991). The use of STRUCTURE (PRITCHARD et al. 2000, FALUSH et al. 2003 for

    linked loci), for example, would allow to group parents according to a common selection

    history. Our genetic similarities between the parents would be replaced by the scalar products

    of parents’ decompositions between the inferred clusters. The remaining of the PIBD

    calculation would be identical. Likewise, matrix A could be computed from the same

    decomposition.

    The proposed method has proved to be powerful in detecting medium sized QTL (h²=0.1) in a

    typical set of inbred lines from a complex pedigree, such as those created in common cereal

    breeding programs. The use of this kind of methods could increase the relevance and cost

  • 30

    effectiveness of quantitative trait loci mapping in applied contexts and could provide an

    alternative to the development of specifically designed recombinant population, by exploiting

    the genetic variation actually used by plant breeders. JANSEN et al. (2003) proposed to use

    parental haplotypes sharing to routinely map QTL in breeding populations. The use of

    haplotypes is challenging in this kind of methods where only little information is available on

    founders and on the relationships between parents. It is even more challenging with the

    increasing use of SNPs versus micro-satellite markers. A mixing of the proposed IBD method

    and the method proposed by JANSEN et al. (2003) would be a good solution for mapping QTL

    and properly estimate the haplotype effects, at a lower marking cost. One would first detect

    QTL within an IBD-based variance component framework at a low marker density then use a

    higher marker density for some QTL of interest to build haplotypes and estimate their effects

    within a fixed-effect framework. Such locally high density mapping could allow identifying

    the haplotypes of minimum length that have the most promising effect. Besides, it would

    directly provide markers to manipulate these haplotypes in breeding schemes.

    The methodology developed in this article is currently applied to the analysis of real wheat

    breeding data.

    Acknowledgments: The authors are grateful to the Ministère de l'Economie, des Finances et

    de l'Industrie for its financial support (ASG program n° 01 04 90 6058)

  • 31

    LITTERATURE CITED

    ALMASY,L., and J. BLANGERO, 1998 Multipoint quantitative trait linkage analysis in general

    pedigrees. Am. J. Hum. Genet. 62: 1198-1211.

    BEAVIS W.D., 1994 The power and deceit of QTL experiments: lessons from comparative

    QTL studies. American Seed Trade Association. 49th Annual Corn and Sorghum Research

    Conference. Washington D.C.

    BERNARDO, R., 1993 Estimation of coefficient of coancestry using molecular markers in

    maize. Theor. Appl. Genet. 85: 1055-1062.

    BINK, M. C. A. M., P. UIMARI, M. SILLANPÄÄ, L. JANSS, and R. JANSEN, 2002 Multiple QTL

    mapping in related plant populations via a pedigree-analysis approach. Theor. Appl. Genet.

    104: 751-762.

    CHARCOSSET, A., and A. GALLAIS, 1996 Estimation of the contribution of quantitative trait

    loci (QTL) to the variance of a quantitative trait by means of genetic markers. Theor. Appl.

    Genet. 93: 1193-1201.

    COCKERHAM, C. C., 1983 Covariances of relatives from self-fertilization. Crop Science 23:

    1177-1180.

    DONINI P., J. R. LAW, R. M. D. KOEBNER, J. C. REEVES and R. J. Cooke, 2000 Temporal

    trends in the diversity of UK wheat. Theor Appl Genet 100: 912-917.

    FALUSH D., M. STEPHENS and J. K. PRITCHARD, 2003 Inference of population structure using

    multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567-

    1587.

    GEORGE A. W., P. M. VISSCHER and C. S. HALEY, 2000 Mapping quantitative trait in complex

  • 32

    pedigrees: a two-step variance component approach. Genetics 156: 2081-2092.

    GILMOUR, A. R., B. R. CULLIS, S. J. WELHAM and R. THOMPSON, 1998 ASREML. Program

    User Manual. Ed. Orange Agricultural Institute, New South Wales, Australia.

    GIMELFARB, A., and R. LANDE, 1994 Simulation of marker-assisted selection in hybrid

    populations. Genet. Res. 63: 39-47.

    GIMELFARB, A., and R. LANDE, 1995 Marker-assisted selection and marker-QTL associations

    in hybrid populations. Theor. Appl. Genet. 91: 522-528.

    HALEY, C.S., and S. KNOTT, 1992 A simple regression method for mapping quantitative trait

    loci in line crosses using flanking markers. Heredity 69: 315-324.

    HARRIS, D. L., 1964 Genotypic covariances between inbred relatives. Genetics 50: 1319-

    1348.

    HOSPITAL, F., L. MOREAU, F. LACOUDRE, A. CHARCOSSET and A. GALLAIS, 1997 More on the

    efficiency of marker-assisted selection. Theor. Appl. Genet. 95: 1181-1189.

    JANSEN, R. C., 1993 Interval mapping of multiple quantitative trait loci. Genetics 135: 205-

    211.

    JANSEN R. C., J-L. JANNINK and W.D. BEAVIS, 2003 Mapping quantitative trait loci in plant

    breeding populations: use of parental haplotype sharing. Crop Sci. 43: 829-834.

    JIANG, C., and Z-B ZENG,, 1995 Multiple trait analysis of genetic mapping for quantitative

    trait loci. Genetics 140: 1111-1127.

    KEMPTHORNE, O., 1955 The correlation between relatives in inbred populations. Genetics 40:

    681-691.

  • 33

    KOROL, A., Y. RONIN, and V. KIRZHNER, 1995 Interval mapping of quantitative trait loci

    employing correlated trait complexes. Genetics 140: 1137-1147.

    LANDE R., and R. THOMPSON, 1990 Efficiency of marker-assisted selection in the

    improvement of quantitative traits. Genetics 124: 743-756.

    LANDER, E., and D. BOTSTEIN, 1989 Mapping mendelian factors underlying quantitative traits

    using RFLP linkage maps. Genetics 121: 185-199.

    LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer

    Associates, Sunderland, MA.

    MALECOT G., 1948 Les mathématiques de l'hérédité, Ed. Masson et Cie, Paris.

    MELCHINGER A. E., M. M. MESSMER, M. LEE, W. L. WOODMAN and K. R. LAMKEY, 1991

    Diversity and relationships among U.S. maize inbreds revealed by restriction fragment length

    polymorphisms. Crop Sci. 31: 669-678.

    MOREAU L., S. LEMARIÉ, A. CHARCOSSET, and A. GALLAIS, 2000 Economic efficiency of

    one cycle of marker-assisted selection. Crop Sci. 40: 329-337.

    MURANTY, H., 1996 Power of tests for quantitative trait loci detection using full-sib families

    in different schemes. Heredity 76: 156-165.

    NEI M. and W. H. LI, 1979 Mathematical model for studying genetic variations in terms of

    restriction endonucleases. Proc. Natl. Acad. Sci. 76: 5369-5373.

    PRITCHARD J. K., M. STEPHENS and P. DONNELLY, 2000 Inference of population structure

    using multilocus genotype data. Genetics 155: 945-959.

    RODER, M. S. , K. WENDEHAKE, V. KORZUN, G. BREDEMEIJER, D. LABORIE, D. et al., 2002

  • 34

    Construction and analysis of a microsatellite-based database of European wheat varieties.

    Theor. Appl. Genet. 106: 67-73.

    SERVIN, B., C. DILLMANN, G. DECOUX and F. HOSPITAL, 2002 MDM a program to compute

    fully informative genotype frequencies in complex breeding schemes. J. Hered. 93(3): 227-

    228.

    S-PLUS, 2000 S-PLUS guide to statistical and mathematical analyses. MathSoft,

    Massachusetts Institute of Technology

    XIE, C., D. D. G. GESSLER and S. XU, 1998 Combining different line crosses for mapping

    quantitative trait loci using the identical by descent-based variance component method.

    Genetics 149: 1139-1146.

    XU, S. 1998 Mapping quantitative trait loci using multiple families of line crosses. Genetics

    148: 517-524.

    XU, S., and W. R. ATCHLEY, 1995 A random model approach to interval mapping of

    quantitative trait loci. Genetics 141: 1189-1197.

    YI, N., and S. XU, 2001 Bayesian mapping of quantitative trait loci under complicated mating

    designs. Genetics 157: 1759-1771.

    ZENG, Z-B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457-1468.

    ZENG, Z-B., 1993 Theoritical basis of separation of multiple linked gene effects on mapping

    quantitative trait loci. Proc. Natl. Acad. Sci. USA. 90: 10972-10976.

  • 35

    TABLE 1

    Estimates of the position, QTL and total genetic heritabilities; h²QTL and h²g respectively

    Experimental design h² g Position ĥ2QTL ĥ2g

    (1) QTL heritability and formula

    h²QTL=0.05

    Formula (1)

    Formula (2)

    0.427 (0.068) 49.44 (25.42)

    46.45 (21.84)

    0.067 (0.035)

    0.085 (0.041)

    0.427 (0.108)

    0.432 (0.107)

    h²QTL=0.1 Formula (1)

    Formula (2)

    0.435 (0.051) 47.76 (20.88)

    45.52 (17.72)

    0.099 (0.042)

    0.129 (0.056)

    0.442 (0.081)

    0.450 (0.083)

    h²QTL=0.2 Formula (1)

    Formula (2)

    0.456 (0.047) 44.91 (7.35)

    45.73 (6.89)

    0.192 (0.058)

    0.228 (0.065)

    0.451 (0.083)

    0.467 (0.091)

    (2) Different experimental designs

    Standard setting 45.52 (17.72) 0.129 (0.056) 0.450 (0.083)

    AIS 25% 47.15 (17.99) 0.135 (0.060) 0.445 (0.084)

    NA 10%

    0.435 (0.051)

    48.55 (18.87) 0.131 (0.098) 0.434 (0.094)

    NP=40 0.451 (0.067) 48.42 (13.75) 0.136 (0.053) 0.446 (0.108)

    NG=20 formula (1) 0.413 (0.059) 48.00 (18.75) 0.091 (0.039) 0.430 (0.091)

    NG=20 formula (2) 0.413 (0.059) 46.89 (15.96) 0.145 (0.049) 0.436 (0.094)

    d=20cM

    d=20, AIS 25%

    0.433 (0.052)

    0.433 (0.052)

    52.27 (23.10)

    52.87 (24.80)

    0.149 (0.070)

    0.147 (0.076)

    0.395 (0.088)

    0.399 (0.087)

    Npar=50, NF6=250 0.403 (0.077) 49.86 (23.25) 0.159 (0.070) 0.441 (0.149)

    The standard setting is a QTL heritability of 0.1, a total genetic heritability (h2g) of 0.435

    (with standard deviation 0.051 among the 100 repetitions) and a 10 cM marker density, with a

    QTL at 45 cM. About 500 individuals are created at the F6 stage coming from 100 parents for

    each breeding cycle. h2g is the result of the readjustment of the residuals in order to obtained

    a given target h2QTL. The studied generation comes from the 10th cycle of breeding, having

    considered 20 founder lines at the beginning of selection.

  • 36

    (1) Each set of runs differs from the simulated QTL heritability noted in column one, and

    by the IBD formula tested. IBD formula (1) takes only into account Half-Sib and Full-

    Sib relationships while formula (2) adds ancestor relationships estimated by markers.

    (2) Each additional set of runs differ from the standard setting by the simulation

    parameter noted in column one. Except the NG=20 case, all the other settings are

    tested with formula (2) only.

    Mean and standard deviations (in parentheses) are calculated among the 100 replicates.

  • 37

    TABLE 2

    LR threshold, test statistic and QTL detection power

    Threshold Test statistic Power (%)

    (1) QTL heritability and formula

    h²QTL=0,05 formula (1) 4.08 4.62 (3.52) 47

    h²QTL=0,05 formula (2) 3.96 5.23 (3.97) 58

    h²QTL=0,1 formula (1) 4.08 7.75 (5.06) 71

    h²QTL=0,1 formula (2) 3.96 9.76 (6.71) 80

    h²QTL=0,2 formula (1) 4.08 22.03 (10.9) 100

    h²QTL=0,2 formula (2) 3.96 23.69 (10.8) 100

    (2) Different experimental designs

    Standard setting 3.96 9.76 (6.71) 80

    AIS 25% 4.28 9.91 (6.56) 72

    NA 10% 4.30 7.81 (5.53) 66

    NP=40 3.70 11.29 (7.09) 89

    NG=20 formula (1) 3.72 5.95 (4.15) 65

    NG=20 formula (2) 3.62 10.62 (5.75) 91

    d=20 4.25 9.21 (6.81) 75

    d=20, AIS 25% 4.94 8.06 (6.50) 64

    Npar=50, NF6=250 4.00 7.96 (3.66) 62

    See Table 1 for the standard setting. Each additional set of runs differs from the standard

    setting by the parameter change noted in column one. Threshold represents the empirical

    threshold calculated for 1000 replicates. Test statistic is the mean and standard deviation of

    the maximum of LR test for the 100 replicates. Power is the percentage of replicates with max

    LR exceeding the empirical threshold.

  • 38

    TABLE 3

    Accuracy of three methods used to infer Confidence Intervals (CI)

    CI- 4 S.D. % 4 S.D. CI- 1 LDO % 1 LDO CI- 2 LDO % 2 LDO

    (1) QTL heritabilities and IBD formula

    h²QTL=0,05 formula (1) 86.40 91 41.78 79 86.66 97

    h²QTL=0,05 formula (2) 72.20 93 41.18 79 85.43 97

    h²QTL=0,1 formula (1) 63.28 90 29.10 82 74.75 99

    h²QTL=0,1 formula (2) 63.40 93 31.04 86 66.62 96

    h²QTL=0,2 formula (1) 29.40 96 17.26 90 34.33 97

    h²QTL=0,2 formula (2) 26.00 98 15.21 91 28.21 97

    (2) Different experimental designs

    Standard Setting 63.40 93 31.04 86 66.62 96

    AIS 25% 63.84 92 26.4 80 62.9 97

    NA 10% 63.24 91 28.48 80 67.6 95

    NP=40 52.24 94 26.66 85 60.34 97

    NG=20, formula (1) 67.60 91 43.95 72 82.80 96

    NG=20, formula (2) 62.20 91 27.30 76 66.44 97

    D=20 93.60 96 36.35 75 73.96 94

    D=20, AIS 25% 94.56 95 35.08 67 75.61 91

    Npar=50, NF6=250 88.16 89 45.61 79 82.82 95

    See Table 1 for the standard setting. “CI- 4 S.D.” represents the size of the CI calculated as

    four times the standard deviation on the estimated QTL position for the significant runs, and

    “% 4 S.D.” represents the percentage of times that the true position is included within the

    delimited interval. “CI- 1 LDO” and “CI- 2 LDO” represent the size of the CI as determined

    by the drop of one and two LOD unit (multiply by 2*ln(10) to convert in LR units)

  • 39

    respectively. “% 1 LDO” and “% 2 LDO” represent the number of times that the true position

    is included within the delimited interval.

  • 40

    TABLE 4

    Estimates of the QTL parameters for different levels of QTL heritability under the

    empirical threshold

    Experimental design h² g Position ĥ2QTL ĥ2g

    QTL heritability

    h²QTL=0.05

    Formula (1)

    Formula (2)

    0.427 (0.068) 47.55 (21.61)

    45.98 (18.05)

    0.092 (0.032)

    0.110 (0.030)

    0.429 (0.110)

    0.437 (0.109)

    h²QTL=0.1 Formula (1)

    Formula (2)

    0.435 (0.051) 49.56 (15.82)

    45.56 (15.85)

    0.117 (0.035)

    0.143 (0.051)

    0.452 (0.081)

    0.450 (0.08)

    h²QTL=0.2 Formula (1)

    Formula (2)

    0.456 (0.047) 44.91 (7.35)

    45.49 (6.50)

    0.192 (0.058)

    0.230 (0.062)

    0.451 (0.083)

    0.471 (0.081)

    Each set of runs differs by the simulated QTL heritability noted in column one, and by the

    IBD formula tested. Mean and standard deviations (in parentheses) are reported only for the

    significant replicates (noted in the “Power” column of Table 2) among 100.

  • 41

    TABLE 5

    Estimates of the QTL and total genetic heritabilities at the true QTL position (45)

    h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435 h²QTL =0.2, h2g=0.456

    Formula ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g

    (1)

    (2)

    0.036 (0.037)

    0.051 (0.047)

    0.426 (0.106)

    0.430 (0.108)

    0.068 (0.048)

    0.094 (0.064)

    0.442 (0.080)

    0.447 (0.080)

    0.191 (0.056)

    0.225 (0.066)

    0.455 (0.081)

    0.471 (0.094)

    Formula (1) and (2) are tested for three levels of QTL heritability at the true QTL position.

    h2g is the result of the readjustment of the residuals in order to obtained a given target h2QTL.

  • 42

    TABLE 6

    Averaged Confidence Interval Heritability for the significant replicates over the 2 Lod-

    Drop-off units interval

    heritabilities h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435 h²QTL =0.2, h2g=0.456

    No-Selection ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g

    (1)

    (2)

    0.049 (0.028)

    0.054 (0.034)

    0.405 (0.104)

    0.418 (0.109)

    0.072 (0.039)

    0.100 (0.063)

    0.434 (0.083)

    0.438 (0.081)

    0.158 (0.062)

    0.194 (0.067)

    0.438 (0.085)

    0.459 (0.082)

    Formula (1) and (2) are tested for three levels of QTL heritability.

    h2g is the result of the readjustment of the residuals in order to obtained a given target h2QTL.

  • FIGURE 1

    1 2 3 NP Q1 Q2 Q3 QNP 1 2 3 NP

    P1 P2 P3 P4 (…) P99 P100

    matrix of crosses

    * * * *

    P1 P2 P79 P80

    Founder lines: full linkage disequilibrium marker-QTL

    G0 generation

    For i=10 breeding cycles

    Gi generation 500 F6 lines

    QTL Detection

    1 to 4 progenies per cross + half-sib relationships

    circular crosses

    G10 generation

    + P81-P100 : overlapping

    NP sub-populations

    Creation of the QTL and polygene effects

  • 44

    0

    10

    20

    30

    40

    50

    60

    70

    80

    P1 P6 P16

    P26

    P36

    P46

    P56

    P66

    P76

    P86

    P96

    Parent name

    Num

    ber o

    f pro

    geni

    es /

    pare

    nt

    MeanStandard Deviation

    FIGURE 2

  • 45

    FIGURE 3

  • 46

    FIGURE LEGENDS: FIGURE 1: Simulation of the breeding scheme

    FIGURE 2: Mean and standard error for 1000 replicates created by the “matrix of crosses”

    function for the half-sib family size under the standard setting (100 parents, 500 F6).

    FIGURE 3: Comparison of the LR profiles for (a) different levels of QTL heritabilities for

    formula (1) and (2), (b) different biased marker information for the standard setting under

    formula (2), (c) difference in the breeding schemes compared to the standard setting, and (d) a

    density of one marker every 20 cM, with 25% of AIS non-IBD alleles.