IBD- based QTL detection in multi-cross inbred designs: A...
Transcript of IBD- based QTL detection in multi-cross inbred designs: A...
-
1
IBD- based QTL detection in multi-cross inbred designs: A case study of
cereal breeding programs
Sébastien Crepieux*,1, Bertrand Servin† , Claude Lebreton‡ and Gilles Charmet*
*UMR 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand Cedex 2, France,
†INRA UMR de Génétique Végétale, INRA/UPS/INAPG, 91 190 Gif sur Yvette, France
‡Limagrain Agro-Industrie, site d’ULICE, av G. Gershwin, BP173, F-63204 Riom Cedex,
France
-
2
IBD-based multi-cross QTL mapping
1 : corresponding author : 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand
Cedex 2, France. Email : [email protected]
Phone : 00 33 4 73 62 43 09 , Fax: 00 33 4 73 62 44 53
Key words: QTL detection, variance component, pedigree breeding, IBD, multi-cross
ABSTRACT
Mapping quantitative trait loci in plants is usually conducted using a population derived from
a cross between two inbred lines. The power of such QTL detection and the parameter
estimates highly depend on the choice of the two parental lines. Thus, the QTL detected in
such populations only represent a small part of the genetic architecture of the trait. Besides,
the effects of only two alleles are characterised, which is of limited interest to the breeder. On
the other hand, common pedigree breeding material remains unexploited for QTL mapping. In
this study, we extend QTL mapping methodology to a generalized framework, based on a
two-step IBD variance component approach, applicable to any type of breeding population
coming from inbred parents. The power and accuracy of this method were assessed on
simulated data mimicking conventional breeding programs in cereals. This method can
provide an alternative to the development of specifically designed recombinant population, by
exploiting the genetic variation actually managed by plant breeders. The use of these detected
QTL in assisting breeding would thus be facilitated.
-
3
INTRODUCTION
The availability of molecular markers in the 1980’s has opened new scope for quantitative
genetics and breeding. It was thus anticipated that the manipulation of loci underlying
quantitative traits (QTL) would be as easily feasible as with mendelian factors. This
perspective, however, has remained largely unreached, despite the large corpus of theoretical
studies on marker assisted selection ( e.g. LANDE and THOMPSON 1990; GIMELFARB and
LANDE 1994 , 1995 ; HOSPITAL et al. 1997). The main reason is probably the cost of markers
and the relatively low improvement in selection efficiency that leads MAS to be generally
much more expensive than conventional breeding (MOREAU et al. 2000). The other reason is
that applied breeding programs and QTL research are often disconnected, i.e. carried out by
different teams and using different plant material.
Classically, QTL analyses are carried out on a few progenies from broad base crosses, coming
from a small number of distantly related lines, often including wild relatives. Such analyses
mostly involve bi-parental progenies such as back-crosses (BC), doubled haploids lines (DH),
F2 or recombinant inbred lines (RILs). In the approaches based on this kind of plant material,
the effect of an allele substitution at a candidate locus is tested. This is called the fixed model
approach (XU and ATCHLEY 1995) since it considers a fixed number of distinct alleles (most
often two) at each putative QTL. Statistical methods for the QTL analysis of bi-parental
populations underwent successive improvements through the advent of Interval Mapping
(LANDER and BOTSTEIN 1989) and its linearization (HALEY and KNOTT 1992), the Composite
Interval Mapping (ZENG 1993, 1994 ; JANSEN 1993) and multiple trait QTL mapping (KOROL
et al. 1995; JIANG and ZENG).
On the other hand, the breeder’s material is far from the studied bi-parental populations.
Breeders generally handle many small families from crosses between (often) highly related
elite lines. Thus, the above described methods are poorly adapted. Moreover there are many
-
4
drawbacks for the breeder’s use of the QTL found on bi-parental populations. First, when
only two parents are considered, some markers and potential QTLs are more likely to be
monomorphic, even if parental lines are carefully selected for trait divergence. Since, by
definition, QTL can only be found at polymorphic sites in the genome, the expected number
of QTL detected with a bi-parental cross will be lower than that expected when analyzing
several crosses at a time (assuming the total number of genotypes is not the limiting factor).
The second drawback is that the QTL effect is estimated as a contrast between two alleles and
in one genetic background only. Therefore, in that context, the improvement of a line by the
introgression of a QTL allele in a completely new genetic background is rather unpredictable,
because of possible epistatic interaction between QTL and genetic background. Finally, from
an economic standpoint, the cost of creation of large single cross progenies and specific trials
for trait evaluation to perform QTL detection is quite high and often at the expense of other
selection programs.
All these drawbacks reduce the breeders’ interest for implementing such experimental designs
when funding and work are constrained. Bi-parental crosses are usually preferred for more
upstream studies, e.g. genomics: the fine-mapping of a QTL, which is a pre-requisite for its
positional cloning, is easier when fewer QTLs are segregating. In contrast, the breeders’ focus
will be to characterize the effect of a wide range of alleles in his germplasm. Methods for
simultaneous detection and manipulation of QTL in breeding programs would thus enhance
the applicability of MAS.
In plant breeding, new methods for QTL detection in complex designs, close to those used in
real breeding schemes have already been developed. MURANTY (1996) suggested to work in
plants with progenies from several parents, in order to achieve a high probability to have more
than one allele at a putative QTL, and also to have a more representative estimate of the
variance accounted for by a QTL. She operated in a fixed-effect framework. Simulations
-
5
demonstrated that a higher QTL detection power was achieved, for a given sample size. XU
(1998) compared the QTL detection powers obtained with random effect models and fixed
effects and found similar values for individual family sizes as low as 25 individuals.
However, in more unbalanced designs, the random effect approach was presumed to be more
suited as it can handle any arbitrary pedigree of individuals (LYNCH and WALSH 1998, XU
1998). Efficient methodologies for more fragmented populations in plants have been
developed (XIE et al. 1998; YI and XU 2001, BINK et al. 2002, JANSEN et al. 2003 for
example), but their extension or implementation for any complex plant designs, implying a
mixing of half-sibs and full-sibs families of different sizes, at any generation of selfing and
with the hermaphrodite status of parents, is not straightforward. The identical-by-descent
(IBD)-based variance component analysis is a powerful statistical method for QTL mapping
in complex populations and can be used in pedigrees of arbitrary size and complexity
(ALMASY and BLANGERO 1998). These IBD-based variance component analyses are derived
from the assumption that individuals of similar phenotype are more likely to share alleles that
are identical by descent. The construction of IBD matrices for alleles at each tested position
along the genome, and the fitting of random effect models (which assumes that QTL effects
are normally distributed) offer an appropriate method to map QTL if the progeny population
is large enough and if the progenies are connected in some way. Besides, these models do not
need to assume a known, finite set of alleles at each putative QTL. Thus, they offer a less
parameterized statistical environment in which to map QTL, because only the variances need
to be estimated instead of every allele substitution effect. Generally, the IBD-based
approaches assume a between family IBD-likelihood of zero (i.e. no parents in common
between the two families), and thus, consider the parents as founders. However, this
assumption is often wrong in common breeding pedigrees. Furthermore, in fragmented
situations, i.e. where there are many families of small sizes (especially when the genotyping
-
6
takes place at a late stage in pedigree breeding, where we may easily end up with as few as
one or two lines per cross), the IBD-likelihood matrix can be very sparse. Hence, there could
be much to be gained in exploring the actual between family IBD-likelihoods.
In the work related in this paper, we took over these developments and further assumed a non-
zero IBD-likelihood between non-sib lines. We then present a unified IBD-based variance
component analysis framework, to map QTL in any kind of multi-cross designs involving
self-pollinating species, at any generation. To test the accuracy of the method, we developed a
simulation program which mimics the steps of real breeding schemes. We chose the
simulation parameters according to the information provided by breeders on their real
material in order to be as comprehensive as possible in the range of genetic configurations
explored.
This method can provide an alternative to the development of specifically designed
recombinant population, by exploiting the genetic variation actually managed by plant
breeders. Our method can thus provide breeders with valuable information about the breeding
values of their material and help them to design selection strategies.
METHODS
Two-step IBD based variance component method
The method used to map QTL in a complex inbred pedigree is a two step variance component
method, as described in GEORGE et al. (2000). Hence, this method first consists in
constructing the (co)variance matrix of fixed and random effects at each putative QTL
position and then estimates the likelihood of the presence of a QTL at these positions using
appropriate linear models.
-
7
These two steps are common to all interval mapping based variance components methods.
What differs mainly among all the published methods is the way to calculate the IBD
probabilities (see GEORGE et al. 2000 for a review of IBD probabilities calculation). We
adopted a deterministic approach to infer IBD probabilities for any generation of
recombination and breeding scheme (e.g. F2, Fn, RIL, BCn), based on the MDM program
(SERVIN et al. 2002).
Mixed linear models
We assume that the quantitative trait is a linear combination of fixed design effects, a putative
QTL effect (with additive or/and dominance effect) and additive polygenic effects. The
random polygenic effect is seen as the cumulative effect of all loci affecting the quantitative
trait that are unlinked to the QTL. The model is:
)1(eZvZuXy +++= β
where y is an (m*1) vector of phenotypes, X is an (m*s) design matrix, β is a (s*1) vector of
fixed effects, Z is an (m*q) incidence matrix relating records to individuals, u is a (q*1) vector
of additive QTL effects, v is a (q*1) vector of additive polygenic effects and e is the residual.
We assume that the random effects u, v and e are uncorrelated and distributed as multivariate
normal densities: ),0(~;),0(~;),0(~ 222 evu INeANvGNu σσσ , with 222 , evu and σσσ being
respectively the additive variance of the QTL, the polygenic variance and the residual
variance. A is the (q*q) additive genetic relationship matrix; G is the (q*q) (co)variance
matrix for the QTL additive effects conditional on marker information; and I is the (m*m)
identity matrix.
The model without QTL segregating in the population is, with the same notations:
)2(eZvXy ++= β
-
8
Computation of the IBD probabilities
We consider a mapping population composed of several sub-populations of small size. Each
of these sub-populations is an offspring coming from an inbred pedigree started with two
parents. For example, these sub-populations could be produced by several consecutive
selfings (e.g. RILs) or back-crossings. We want to compute the probability that two
individuals taken from any of these sub-populations share IBD alleles at a given locus of their
genome. If we consider a pair of individuals from the mapping population, they may be (i)
taken from the same sub-population, in which case they are full-sibs (ii) taken from two
different sub-populations. In this last case, if one of the parents is common to the two sub-
populations, the two individuals will be half-sibs, if the parents of the two sub-populations are
distinct, the two individuals are considered as unrelated. We will now draw relevant
calculations of the IBD probabilities for each of these cases.
IBD value between two sibs at a QTL: Within each full-sib family of the breeding scheme,
only two alleles are segregating giving only three possible genotypes at the QTL: QQ, Qq and
qq.
Following XIE et al. (1998) notations, the IBD value of two individuals i and j, is measured
as:
QqQqorQqqqqqqqor
qqQQQqQQQQQQ
forforfor
jiji −−−
−−−
== ,
,,,
012
2 ,, θπ
where ji,π are the ijth elements of G, and ji,θ is MALECOT’s (1948) coefficient of coancestry.
As pointed out by many authors, when inbreeding is present, ji,π is not interpreted as the
proportion of alleles IBD, but rather as twice the coefficient of coancestry (KEMPTHORNE
1955; HARRIS 1964; COCKERHAM 1983).
-
9
Inferring the IBD value of a QTL from markers: The IBD value is completely determined
by the genotypes of two individuals at the QTL of interest. The actual QTL genotype of an
individual, however, is not observable, and must be inferred from flanking marker
information. We denote the following probabilities:
)/Pr(),/Pr(),/Pr( 012 MjMjMj IqqpandIQqpIQQp === .
We write [ ] Tiiii pppp 012= and [ ]Tjjjj pppp 012= . The conditional expectations of the
IBD values between two full sibs are:
jTiMjiji CppIE == )/( ,, ππ for between individuals,
and jT
Mjiji pcIE == )/( ,, ππ for the individual with itself, where:
=
=
121
210111012
candC
In the rest of the paper, this formula, for computing IBD probabilities, will be referred to as
formula (1).
General case: Above, we emphasized the calculation of IBD for the full-sib case. The
generalization to the half-sib case is trivial.
Using this first formula to compute IBD probabilities, we assumed that parents of sub-
populations were unrelated, i.e. they did not share any common ancestors. However, in
practice, these parents were coming from previous generations of breeding and were very
likely to share IBD alleles in their genome (due to the intensive use of some “star varieties”,
for example). In order to take these possible relationships between parents into account, we
estimated coancestry coefficients between them, using molecular information, as suggested by
BERNARDO (1993). The pedigree information, if available, could be used to that end.
However, the selection pressure may generate a discrepancy in the predicted proportion of
parental genomes shared by the current lines. Besides, we assumed that the pedigree
-
10
information was often very scarce or even unavailable. Hence, we resorted to a genetic
similarity based index to estimate that proportion of genome.
First, we generalized the C matrix to the known half-sib, full-sib and unrelated individuals by
introducing the coancestries between parents, estimated by markers in our case. We
considered two individuals taken in two sub-populations. Let P1 and P2 be the parents of the
first sub-population and P3 and P4, the parents of the other sub-population. We denote by
GSP1P3, GSP1P4, GSP2P3 and GSP2P4 the estimates of the coefficients of coancestry between
these parents. Taking into account possible coancestries between P1 and P3 on one hand and
P2 and P4 on the other hand, the C matrix can then be re-written as:
+=
4242
42423121
31
3131
1
20)(
02
PPPP
PPPPPPPP
PPPP
GSGSGSGSGSGS
GSGSC
Note that in the full-sib case P1=P3 and P2=P4, so that GSP1P3 and GSP2P4 are equal to one
and the C1 matrix is similar to C. Similarly, the relevant C matrices for half-sibs individual or
for unrelated individuals can be obtained by replacing respectively GSP1P3 by one and GSP2P4
by zero on one hand, and both GSP1P3 and GSP2P4 by zero on the other hand.
Similarly, taking into account the coefficients between the parents P1 and P4 on one hand and
P2 and P3 on the other hand, we can re-write the C matrix as :
+=
02)(21
20
4141
32413241
3232
2
PPPP
PPPPPPPP
PPPP
GSGSGSGSGSGSGSGS
C
Finally, we can draw a general formula for the conditional expectation of the IBD values
between two individuals coming from four (distinct or not) inbred parents:
jTij
TiMjiji pCppCpIE 21,, )/( +== ππ
-
11
))])([()])([(
)])([()])([((2)/(..
121
0121
042121
2121
032
121
0121
241121
2121
231,,
iijjPPiijjPP
iijjPPiijjPPMjiji
ppppGSppppGS
ppppGSppppGSIEei
++++++
+++++== ππ
The conditional expectation of the IBD for an individual with itself remains:
12,, 2)/( jjMjiji ppIE +== ππ
In the rest of the paper, this formula, using the C1 and C2 matrices, will be referred to as
formula (2).
The elements in the additive relationship matrix A are estimates of the proportion of IBD
genome between any two lines. The ai,j elements is the proportion of IBD genome between
line i and line j, based on genetic similarities.
For both A and G matrices, genetic similarities (GSi,j) were computed using NEI and LI
(1979) formula.
Implementation of the IBD formula
We used the deterministic approach of the MDM program (SERVIN et al. 2002) to compute all
the pi and pj probabilities. IBD-likelihoods were computed every 3cM. Two flanking markers
were used to infer the genotypes probabilities. In the frequent case where the two parents
shared the same marker alleles at one or two loci flanking the putative QTL position, the next
closest markers to the interval were used. It can easily be demonstrated that the IBD
probabilities calculated at a putative QTL will be more precise if the flanking markers are
highly polymorphic. Issues on the use of better estimates of the coefficients of co-ancestries
will be discussed on the last part of the article.
All the G matrices were then inverted, and written in ASREML (GILMOUR et al. 1998) format
for user-defined inverse (co)variance matrices. We also computed the additive relationship
matrix A. Then, it was inverted and written in ASREML format (end of step1).
In step 2, ASREML provided restricted maximum-likelihood (REML) estimates of (1) and
(2). To test for the presence of a QTL against no QTL at a particular chromosomal position,
-
12
we used the Log Likelihood Ratio test: LR= -2ln( L0(H0, no QTL present) – L1(H1, QTL
present) ), where L1 and L0 represent the likelihood values of (1) and (2) evaluated at the
REML solutions, respectively.
Test statistic under the null hypothesis
The choice of the threshold in this kind of population is always challenging. Many
publications (ZENG 1994; XU and ATCHLEY 1995 for example) report that when a
chromosomal interval is being scanned, the empirical distribution of LR follows a mixture of
two Chi-square distributions, with one and two degrees of freedom, respectively.
Since this article deals with simulated data, it is possible to replicate data under the null
hypothesis of no QTL segregating, construct the empirical distribution of LR and derive
empirical threshold by choosing the 95th percentile of the highest test statistic, generally over
500 or 1000 stochastic realizations. In this paper, we calculated an empirical threshold for
every set of parameters to see whether significant differences on threshold appeared or not.
Hence, for each set of parameters tested, we ran 1000 additional simulations with no QTL
segregating. We increased the polygenic variance such that the total genetic variance
remained unchanged. We thus determined the empirical threshold by choosing the 95th
percentile from the list of 1000 runs.
A SIMULATION STUDY: The case of a cereal breeding program
We chose the cereal example in the simulation study as it contains most of the difficulties
generally encountered in inbred breeding schemes:
- the frequent unavailability of reliable pedigree information, beyond the parents (and
thus unavailability of ancestor lines to genotyping)
- the potential to genotype only advanced generations of selfing, when the number of
lines has been narrowed down and the trial precision of trials increased, constraining
-
13
to compute IBD at the F6 or F7 stage without any marker information between the
initial cross and the resulting progenies
- the very high number of parents of the mapping population yielding very small full-sib
families, and an uneven (L shaped) distribution of half-sib family sizes
Simulation of the inbred breeding scheme
Every year, new plant breeding programs are started. The choice of the numerous parents for
crosses combines on one hand lines with the highest genetic merit for some traits of interest
like yield or quality, and on the other hand, some lines of specific interest such as special
quality or pest/disease resistance, sometimes taken in old or exotic material. The original
crossing scheme in a breeding program is very dependent on the breeder for the choice of the
parents, but the breeding process is often closely the same.
To reproduce the steps of the breeding programs, an S-PLUS (2000) function was developed.
This function was designed by analysing wheat breeders information and allele diversity and
frequencies made available by recent studies (DONINI et al. 2000; RODER et al. 2002).
All simulations follow a similar procedure: (Figure 1)
Figure 1 around here
Marker-genotypes construction: Before the start of the breeding process, we considered only
a few founder lines, with full linkage disequilibrium across all their genome, hence between
all the markers and the QTLs. We also imposed that at this generation, no founder lines had
any alleles in common with any other. Thus, line 1 carried only markers and QTLs coded 1,
line 2 only markers and QTLs coded 2, and the last founder line (the NPth) carried only
markers and QTLs coded NP all along its genome. The genome of simulated individuals was
composed of 21 chromosomes of 100 cM, with markers evenly spaced every d centiMorgans
-
14
all along the chromosomes, with two markers on each chromosome telomeres. On
chromosome 1, we simulated a single QTL between two markers. In addition to the QTL of
interest, we simulated on the other chromosomes Npoly=40 randomly located QTLs (that
could therefore be linked or not) with random effects to simulate the polygenic contribution to
the trait values.
First generation of crosses: From the founder lines generation, circular crosses, i.e. 1*2, 2*3,
… , NP-1*NP, NP*1 were performed. We then derived lines by self-pollination during five
generations in order to obtain F6 lines. NP mixed sub-populations of the same size (same
contribution for all founders for this first generation) were derived, giving the “G0”
generation. The population was considered as totally fixed, by sampling only one gamete after
the last self-pollination. This stage corresponded to the end of a breeding cycle. The
recombination procedure was based on randomly placed chiasmatas with no interference.
Quantitative trait: The parameters for the creation of the quantitative trait are the QTL
heritability (h²QTL) and the heritability of the polygenes (h²poly).
We created at the founder line generation the NP allele effects for the QTL and the Npoly
polygenes. The NP=20 possible effects of the QTL were drawn from a normal distribution
with mean 0 and variance 1. Then the QTL variance (VarQTL) was calculated at the true QTL
position, and the NP=20 effects for each of the Npoly polygenes were extracted from a normal
law with mean 0 and variance [VarQTL*(h²poly/h²QTL)]/Npoly]. Finally, the true variance
accounted by the polygenes was computed (VarPoly), and a random normally distributed noise
with variance 2eσ = [VarQTL *(1/h²QTL-1) - VarPoly ] was added to simulate phenotypic values
of the trait. Thus, the ratio of the additive variance explained by the QTL on the total
phenotypic variance is exactly equal to the specified value h²QTL while the ratio of polygenic
QTL on the total phenotypic variance could be slightly different from the specified h²poly.
Hence, the allele effects and the environmental variance were created at the first generation,
-
15
and remained constant for all the generations even if the number of alleles decreased.
Nevertheless, the environmental variance was adjusted at the last generation to set the desired
QTL heritability before performing the QTL detection.
Overlapping generations and matrix of crosses: When the genotype and phenotype of the
lines were obtained, virtual breeding schemes have been performed. Two hypotheses were
implemented by extrapolating information obtained by breeders:
- The “overlapping” choice of the parents. All the parents were not necessarily extracted from
the last generation only, but a proportion of them (parameter) could originate from older ones
- The influence of a matrix of crosses on the structure of the resulting progeny of a cross
breeding program, which really influenced the effective population size
The design of crosses at the beginning of a breeding scheme could be seen as a geometric
series, since the representation of parents in the selected progeny is uneven, L-shaped rather
than random. For example, if a given line, say X, is the most cultivated line at a given period
(with the best agronomic performance in a range of environments), X will usually be crossed
to many other lines to fully exploit its genetic value. After self-pollination and selection, a
certain number of lines coming from this parent X will still remain at the F6 stage, and will
form one of the largest half-sib families of all the breeding scheme (containing possibly some
full-sibs when a specific cross is particularly outstanding). On the other hand, an exotic plant
with a really focused interest but with low agronomic performance may also be used to
initiate crosses, but at a smaller scale. Some of its offspring will also probably be selected but
at a much smaller extent.
A matrix of crosses was implemented to reproduce the formation of half-sib and full-sib
families during a cycle of breeding, hence taking into account unbalanced contributions of
parents to the final population. This matrix was filled only upper diagonal, with the parents
sorted from the best ones to the exotic ones. The choice and order of the parents to fill the
-
16
matrix were based on their anteriority. We considered that the best agronomic ones came from
the closer generations of breeding, and that the exotic ones came from older generations of
breeding. The “overlapping” option extracted 80% of parents from the closest generation of
breeding, and 20% from all the older generations (accounting for 10% of the resulting
progeny). On average, 285 crosses were performed per simulated selection cycle (out of 4750
possible crosses in the full matrix of crosses). Note that each cross gave, on average, 1.75 full-
sibs and that each parent was found, on average, in ten progenies. Thus, each individual is
related, on average, to 8.25 half-sibs and related to 0.75 full-sib.
The resulting half-sib families sizes are presented in figure 2.
Figure 2 around here
Performing n breeding cycles: After the first generation, a loop on the number of breeding
cycles was performed and the parents of crosses, the resulting progenies and the phenotypic
data were stored. All available generations were used to build the next. The last breeding
cycle, NG, was used for QTL detection.
Note that at the beginning, all the allele frequencies were equal, which was not the case after
many generations due to genetic drift and non-panmictic conditions. NP alleles with different
effects were possible at each QTL locus and at each marker. All the markers and QTL were in
full linkage disequilibrium at G0 but were not after many recombinations (5 generations of
self-pollination by cycle, and NG cycles before the mapping generation).
Thus, it is easy to anticipate that the information carried by markers and QTL will be different
in many cases, and that ANOVA based method are likely to be unefficient.
As the goal of this article is not to anticipate the influence of selection on the QTL detection,
we did not performed recurrent pedigree selection on the value of the quantitative trait for the
-
17
choice of the parents, or intra-cross-selection during the self pollination process. We just
wanted to anticipate the influence of the structure of the population and the effect of the
design of crosses and breeding cycles for a non-selected trait, at non-selected locus.
Design of simulations:
Standard setting: as proposed by XU (1998), instead of performing simulations on a factorial
design of every possible parameter combinations, we chose a standard setting for each
parameter. The conditions of this standard setting were 21 chromosomes of length 100 cM
each with 11 markers spaced every 10 cM, a QTL segregating at position 45 on chromosome
1 with heritability 0.1. Npoly=40 polygenic QTL were randomly placed on the other
chromosomes setting a total genetic heritability (h²g) of 0.435 (with standard deviation 0.051
among the 100 repetitions). For each cycle, about 500 individuals were created at the F6 stage
coming from 100 parents. The studied generation came from the 10th cycle of breeding,
having considered NP=20 founder lines at the beginning of selection.
Comparison of IBD formula (1) and IBD formula (2) for three levels of QTL heritability:
Both formulae were tested for the same set of 100 independent replicates. Three different
quantitative traits per replicate were created at the last generation of breeding scheme with
QTL heritabilities 0.05, 0.1 and 0.2. Total genetic heritability (h²g) was initially set to 0.5 but
after 10 generations of crosses (and genetic drift), some stochastic differences between
replicates appeared. Thus, h²g values are 0.427 (0.063 standard deviation among 100
replicates) for h²QTL=0.05, 0.435 (0.051) for h²QTL=0.1, and 0.456 (0.047) for h²QTL=0.2.
Different experimental conditions: We first tested the robustness of formula (2) for different
marker bias conditions. To avoid stochastic differences between the tested conditions, the 100
same replicates from the standard setting were used before adding bias to the marker
information. Thus, differences between the results within those conditions can be directly
-
18
attributed to this added bias on marker information. Created biases included (1) missing
marker information (NA) (10% of randomly placed NA in the progenies), (2) portion of non-
IBD Alike-In-State (AIS) alleles (a portion of allele codes were changed to other existing
allele codes in parents and progeny files to obtain 25% of AIS (i.e. not IBD) alleles in the
whole genome).
We then varied out parameters around the standard setting, changing only one parameter at a
time: these parameters include (1) the total number of founder lines in the base population
(NP: 40 vs 20); (2) the number of breeding cycles (NG: 20 vs 10) ; (3) the marker density (d:
20 vs 10) ; (4) the number of lines at the mapping generation (n: 250 vs 500) and number of
parents (m: 50 vs 100)
We also tested the 20 cM marker density condition (3) with 25% of AIS alleles in order to
anticipate the effect of this added bias at a lower marker density. Formula (2) was used for the
analysis of all these variations, except for the NG=20 case, where we tested both formulae.
Some parameters remained constant for all the simulations: the total number of chromosomes
(21), the number of polygenic QTL (40), the position of the QTL on chromosome 1 (45 cM
except for the d=20cM case where the QTL position is 50 cM), the total genetic heritability
originally set to 0.5.
RESULTS
We report the results of the two step variance component analysis on the different settings
below. Due to computer constraints, only every third centiMorgan was tested for the presence
of a QTL. Under each condition, the detection was performed for 100 random replicates.
Parameters estimates and their standard error are reported. The empirical thresholds were
computed from the analysis of 1000 replicated data sets. In the special case of the estimation
of the QTL position, we measured a confidence interval (CI) based on the LOD-Drop-Off
-
19
(LDO) method (LANDER and BOTSTEIN 1989) calculated for each significant replicate. With
this interval, we calculated the frequency at which the true QTL location was included in the
LOD- Drop-Off CI. We also measured the power that the true QTL position was included in
the CI determined by 4 times the standard error of the estimated position. Finally, for the
significant replicates and at the true QTL position, we calculated the bias on the QTL
heritability induced by our method, for both formulae. We then computed the QTL
heritability over an interval including the true QTL position.
Figure 3 around here
The average likelihood ratio test profiles over 100 replicates for the different settings explored
are presented in Figure 3. As expected, we notice a strong influence of the magnitude of the
QTL effect (i.e. the heritability of the QTL) on the LR profile (Figure 3a). Formula (2) -
which takes into account ancestor pedigree relationships as estimated by markers, to infer the
IBD probabilities - highly outperforms formula (1) for the three levels of QTL heritabilities in
terms of detection power. For subsequent simulations, we used formula (2) only.
The method seems robust to biased information on markers (Figure 3b). Indeed, likelihood
ratio test profiles under conditions supposed to be found in real breeding schemes (i.e. non-
informative alleles or wrongly informative, biased map estimation) shows only small
differences between the different tested conditions, except for the 10% missing marker
information on the last generation where the LR profile lies below the profile obtained with
complete marker information. Alike In State alleles that come from different founder lines –
hence linked to different, non-IBD QTL alleles - have only a small influence on the LR curve.
Figures 3c and 3d show the variations around the standard setting, for the number of founders,
number of breeding cycles, total number of F6 and marker density. For a constant mean size
-
20
of the derived populations (10 F6 lines per parent), switching from 500 to 250 F6 generates a
strong decrease in detection power (see graph 3c, lower curve). We also notice on this figure a
very low influence of the number of breeding cycles (NG=20) and a strong effect of the
marker informativeness represented by the number of founder lines (NP=40 against 20).
Finally, we notice a low influence of the marker density, except on the smoothness of the
curve which presents informativeness peaks close to the marker positions (Figure 3d).
Nevertheless, with 25% of AIS alleles, the QTL detection power was found more sensitive to
a decrease in marker density from a value of one marker every 20 cM downwards. With a
marker every 10 cM, the decrease in detection power is not very large (see Figure 3b).
Table 1 around here
The ability of the method to accurately estimate the parameters of interest can be judged from
the results presented in Table 1. The accuracy of the QTL estimated position increases with
the switch from formula (1) to formula (2) and also, as expected, with higher QTL
heritabilities. Nevertheless, formula (2) leads us to overestimate both QTL and total genetic
heritabilities, more than formula (1) does.
For the different designs explored, the smaller population size is the parameter which affects
the precision of the QTL position estimate most. On the opposite, the higher number of
founder lines at the beginning of crosses gives the best estimates, as it increases the chance to
have polymorphic markers between the parents. The influence of the number of breeding
generations on the accuracy of the parameter estimates also turned out to be small.
Finally, we notice that the 20 cM density case yields accurate results even if the number of
markers available to build the relationship matrix and infer the IBD is divided by two.
Nevertheless, both heritability estimates are less accurate than with a 10 cM density map.
-
21
Table 2 around here
The empirical threshold values of LR test statistics over 1000 replicated simulations are
reported in Table 2. For all the designs, the critical values are nearly equivalent. This is not
really surprising as the number of parameters being tested in the random model strategy
remains the same. We also report in Table 2 the average LR test statistics and the power
estimates for a Type I error α=0.05 over 100 replicated simulations. First we notice that the
value of the LR test statistic increased with the value of the QTL heritability and that the
chance to detect QTL is increased by using formula (2).
Biased marker information (non-IBD but AIS marker alleles, missing genotypes) does not
really influence the detection, except for 10% of missing data where the loss in power reaches
14% in comparison to the standard setting. For the different experimental designs, a higher
number of founder lines tends to increase the detection power. Finally, the simulations where
d=20cM and NF6=250 give correct detection powers in comparison to the standard setting.
Table 3 around here
In Table 3, we report the size of the confidence interval determined by four standard
deviations of the estimated position or the drop by one and by two LOD units. These CI have
been established by only taking the significant runs into account. We also report the
frequency at which the true position (i.e. 45 cM) is included in the delimited interval
(percentage of inclusion).
We notice that the drop by two LOD units gives the confidence interval that is closest to 95%
for the percentage of inclusion. In most cases, this interval is larger or equal to the one
delimited by four standard deviations. Nevertheless, as the LOD drop-off method does not
-
22
give symmetric intervals, it takes into account the information of the curve in a better way.
We also notice that the drop off by one LOD unit gives appropriate confidence intervals only
for the highest QTL heritability.
Table 4, 5, 6 around here
Table 4 shows, for the different levels of QTL heritabilities, the parameter estimates for the
replicates that are significant under the empirical threshold. We notice that, in comparison to
Table 1 which includes all the replicates, the overestimation of the QTL heritability is higher.
However, the estimated position is more accurate when averaging over the significant
replicates only, yielding a smaller standard deviation also.
In Table 5, we present the QTL and total genetic heritabilities at the true QTL position to
detect possible bias from the method under formula (1) and (2). We notice that formula (2)
gives more correct estimates at the true QTL position than formula (1) for all levels of QTL
heritabilities. As formula (2) gives almost unbiased estimates at the true QTL position, a
corrective factor for the heritability could then be worked out as a function of the estimated
position. We thus report in Table 6 an attempt to give more appropriate estimate of the
heritability for this kind of methods. We built an “Averaged Confidence Interval Heritability”
by averaging the heritability over the confidence interval around the position estimate (which
is supposed to include the true QTL position), instead of taking only a point heritability
estimate at the detected position. We notice that this corrective factor yields results very close
to the true parameter value with formula (2) but underestimates both heritabilities with
formula (1).
-
23
DISCUSSION
Obviously, many statistical methods already exist to map QTL in inbred plant material;
however, these methods mainly focus on a single bi-parental cross. Other methods have been
developed to address more challenging population structures (XIE et al. 1998; XU 1998; Yi
and XU 2001, for example). Nevertheless, these methods do not appear to be easily extended
to highly fragmented populations, at any selfed or back-crossed generation, and coming from
many different parents. They also do not take into account the possibility for alleles to be IBD
if ancestor pedigrees are not available.
In this study, we extended the QTL mapping methodology proposed by GEORGE et al. (2000)
and based on a two-step IBD variance component approach, to typical plant breeding
populations made up of selfed lines which may have either: one or two parents in common,
parents related to each other or not related to each other. In this paper we studied the simplest
possible scenario where no relevant information about the ancestors’ pedigrees was available.
The power and accuracy of this method were assessed using simulated data mimicking
conventional breeding programs in cereals, in an effort to reproduce actual conditions of
marker and gene frequencies and linkage disequilibrium across the parental lines.
In constructing the matrix of IBD probabilities, a more thorough use of the marker data was
achieved by calculating the genetic similarities between the parents. The extent to which the
matrix G was modified from formula (1) to formula (2) is quite large. The proportion of PIBD
values equal to zero with formula (1) – those values between non-sib lines – and replaced by
non-zero values, was equal to 91%, as calculated from the distribution of family sizes featured
in Figure 2. The inferred relatedness patterns between non-sibs leads to a substantial
improvement of the accuracy of the position estimates and of the QTL detection power.
Nevertheless, a downside of this improvement is a stronger overestimation of the QTL
heritability as we shall discuss below.
-
24
The method performed quite well for all the tested sources of bias (missing marker data, non-
IBD alike-in-state alleles). Two complementary explanations can be put forward to explain
the observed loss in statistical power in the settings with non-IBD AIS alleles, observed with
the lower marker density:
1- the lower chance to have informative markers flanking the interval being scanned.
Thus, informative flanking markers had to be fetched further apart on average, thus
decreasing in turn the estimates accuracy of the putative QTL’s allelic state.
2- A higher proportion of alleles that are AIS but not IBD should also have generated an
upward bias of some of most genetic similarity estimates between the parents. This, in
turn, will have affected the estimates of IBD probabilities and concomitantly the
additive relationships between individuals thereby generating an upward bias of their
estimates for non-full-sibs.
In our population design, there is a strong within family linkage disequilibrium that can be
exploited by comparing the parent’s genotypes to the current F6, which accumulated relatively
few cross-overs. Formula (1) is solely based on the utilization of this linkage disequilibrium,
and is similar to that used by XIE et al. (1998). Formula (2) can be viewed, loosely speaking,
as an attempt to merge, to some extent, several families together on the basis of the likelihood
that the parents share the same alleles identical-by-descent at the putative locus. The power
increase obtained by using formula (2) follows the same principle as that obtained by XIE et
al. (1998) in his Table 7 when he switched from a 100 x 5 sampling strategy to a less
fragmented 50 x 10.
The same comparison can be made to explain the increased detection power observed after 20
breeding cycles instead of 10. Due to the very uneven parental contribution to the crossing
scheme at each breeding cycle, some genetic drift takes place regularly during the breeding
-
25
cycles leading to the loss of certain haplotypes. Cross-overs increasingly occur between
chromosome blocks with similar haplotypes and are thus genetically ineffective. This could
be compared to a situation where a smaller number of “effective” parents were used, which,
for a constant progeny size, gives enhanced power (XIE et al., 1998). This explanation is
confirmed by the comparison between the use of formula (1) and formula (2) for the same 100
replicates, for the NG=20 setting. Formula (2) anticipates the similarities between the parents
in a better way, hence yielding better detection power.
QTL heritability readjustment before mapping
In preliminary simulations, QTL heritabilities changed, from the founder line generation to
that of the mapping F6 lines, due to random genetic drift and non-panmictic conditions in
small populations, hence yielding different heritabilities at the last generation. What we are
concerned with is the QTL heritability at the current mapping generation to allow sounder
comparison with published results. It therefore seemed sensible to choose that one, as
opposed to that at the founder generation, as an entry parameter for our study. It is also to
conciliate a germplasm history effect and a resulting QTL heritability set as a parameter, that
we did a systematic readjustment of this parameter at the end of the ten breeding cycles. In so
doing, we did not compromise too much with the assessment of the germplasm history effect
as far as allelic frequencies and the ratio of the QTL effect over the other polygenic effects are
concerned.
Overestimation of the QTL heritability and proposals for a corrected heritability
CHARCOSSET and GALLAIS (1996)’s conclusions about the overestimation of h2QTL by the R2
estimator, in the standard fixed-effect ANOVA framework, cannot be directly extended to our
case, where random effects are fitted to our QTLs. In fact, if the model is known, REML
estimates of σu2 and of σv2 are unbiased since the estimates of the fixed effects and the
prediction of the random effects are unbiased – the “U” of the BLUE and BLUP acronyms.
-
26
However, in QTL mapping, the model is unknown inasmuch as we do not know whether
there is a QTL or not, or if we assume that there is one, we do not know which locus must
have its segregation’s effect fitted in the model.
In this paper, we related the mean ĥ2QTL from all stochastic realisations, in addition to that of
the significant QTL’s only. This allowed us to distinguish between the part of overestimation
due to the Beavis effect (BEAVIS 1994) and other remaining sources of bias. There did remain
some. However, when, in our simulations, h2QTL was estimated at the QTL’s real position only
– which, again, one cannot do over real data - this bias disappeared with formula (2). This
suggests that :
1- the uncertainty over the QTL’s position generates a substantial bias in the QTL
heritability estimates by itself. Since the locus retained is the one that yields a model
with the maximum likelihood ratio (Lmax), it is also likely to be the locus where a
chance association with the residuals component of the phenotype is strongest. Thus,
in terms of expectancy, the residuals vector will, on average, play towards either
decreasing or increasing ĥ2QTL with the same frequency at the QTL’s real position,
whereas it will play towards increasing it more often at Lmax, since it is, by definition,
the locus most strongly associated with the phenotypes.
2- Since formula (2) recovers more of the real information, it was quite expected that
ĥ2QTL, in this case would approach h2QTL more closely and that formula (1) would give
an under-estimate, at the QTL’s real position.
We shall focus on formula (2) (which yields more accurate heritability estimates at the true
QTL position) to propose a correction of the (over)estimated heritability at the position of
maximum likelihood. At this stage, we focused on the significant replicates only to be
representative of what we would find with real data. We plotted the LR curve and that of the
-
27
putative QTL heritability estimate from many replicates along the scanned chromosome. The
highest LR at a certain position also corresponds to the highest detected QTL heritability as
exemplified by the fact that both curves showed the same pattern, i.e. the estimated QTL
heritability and the LR curve followed proportional values along the y-axis. This property can
be mathematically demonstrated.
For lower levels of QTL heritabilities, the position was poorly estimated for many replicates
(high standard deviation of the position estimates for the set of significant replicates), and the
QTL effects were overestimated. This was interpreted as being due to the Beavis (BEAVIS
1994) effect, because chance association between residuals and genotype can yield a
maximum for the QTL heritability away from the true QTL position.
On the contrary, when drawing the same curves for the replicates with the highest QTL
heritability (i.e. 0.2), we observed the same pattern for the curves (they still follow each other,
as expected), but the detected position was more often very close to the true QTL position
(yielding a smaller standard deviation on QTL position estimates for the set of significant
replicates). Thus, the variance component method applied to high QTL heritabilities did not
yield such a bias on QTL heritability estimates. This was interpreted as due to a lower effect
of the residual component, contrary to the case of lower heritabilities.
To summarize, we pointed out that:
(i) the estimated QTL heritability curve follows the LR curve.
(ii) formula (2) yields the correct QTL heritability estimate at the true QTL position,
(iii) the poor heritability estimates at the detected position are due to wrong position
estimates, and to the averaging of heritabilities, which were obtained at their own
different position estimates (hence not at the true one).
We worked out a sort of Confidence Interval for the estimated heritability, based on the
Confidence Interval for the QTL position estimate: bearing in mind that the expectancy of the
-
28
position where the estimated heritability is equal to the true one lies at the true QTL position.
Thus a pair of loci that delimits a confidence interval (hence a set of positions) that entails the
true QTL position will also delimit a corresponding set of estimated heritabilities – at the
different likely QTL positions within this interval - that entails the true one. The expectancy
of the heritabilities calculated in the middle of the interval will be higher than the real QTL
heritability by a certain factor that only depends on the real QTL heritability and on the
experimental design. On the other hand, towards the ends of the confidence interval, the
heritability of the QTL would, on average, be underestimated, by a factor that depends not
only on QTL heritability and experimental design but also on the chosen stringency of the
confidence interval. Hence, a less stringent confidence interval will contain a higher
proportion of underestimates. What we observed was that a 95% confidence interval (as
roughly determined by a 2-LOD drop-off interval) seems to contain just the right proportion
of under- and overestimates so that when we average the estimated heritabilities over the
confidence interval, we obtain an unbiased estimate.
Leads for improvement:
Taking all the markers to calculate the genetic similarities between the parents seems to be the
most appropriate solution in order to calculate the ji,π at each putative locus. As for the
polygenic term of the model, one may argue that the calculation of the matrix A (in v ~
(0,Aσv2)) could be more precise if it was based on the markers that are actually linked to some
polygenes, i.e. to some QTLs, instead of using all the markers indiscriminately as we did in
this study. Implementing this scheme would require a forward iterative search procedure
whereby the first step of QTL mapping would be carried out with no polygenic term. The
second step would add a polygenic term with A being equal to the arithmetic mean of the G
matrices of the different positions of the QTL detected in step one. New QTLs could then be
detected due to an increase in power, since some background genetic noise would have been
-
29
removed or at least, it is expected that the new QTL position estimates will be more precise.
Hence, in the subsequent round these new position estimates would be used to update A. And
the QTL search could be iterated until a convergence criterion is satisfied. This procedure is
somewhat analogous to one option of the Composite Interval Mapping proposed by ZENG
(1994) in which a best subset of markers was chosen by stepwise regression, then used as
cofactors in the linear model adjustment in the QTL search. This procedure, though, can bring
an advantage only if a few QTLs explain the genetic variation as opposed to many with a
small effect, all over the genome.
There is still some scope for a more accurate and probably less biased estimation of the
coefficients of co-ancestries between parents and individuals in order to better estimate the
parameters of the model and increase the QTL detection power. A first attempt to improve the
PIBD estimates could be to subtract from all genetic similarities an estimated proportion of
alleles in common that supposedly unrelated lines have in common – by definition, these
alleles in common would be alike in state only and not IBD. This method was suggested by
MELCHINGER (1991). The use of STRUCTURE (PRITCHARD et al. 2000, FALUSH et al. 2003 for
linked loci), for example, would allow to group parents according to a common selection
history. Our genetic similarities between the parents would be replaced by the scalar products
of parents’ decompositions between the inferred clusters. The remaining of the PIBD
calculation would be identical. Likewise, matrix A could be computed from the same
decomposition.
The proposed method has proved to be powerful in detecting medium sized QTL (h²=0.1) in a
typical set of inbred lines from a complex pedigree, such as those created in common cereal
breeding programs. The use of this kind of methods could increase the relevance and cost
-
30
effectiveness of quantitative trait loci mapping in applied contexts and could provide an
alternative to the development of specifically designed recombinant population, by exploiting
the genetic variation actually used by plant breeders. JANSEN et al. (2003) proposed to use
parental haplotypes sharing to routinely map QTL in breeding populations. The use of
haplotypes is challenging in this kind of methods where only little information is available on
founders and on the relationships between parents. It is even more challenging with the
increasing use of SNPs versus micro-satellite markers. A mixing of the proposed IBD method
and the method proposed by JANSEN et al. (2003) would be a good solution for mapping QTL
and properly estimate the haplotype effects, at a lower marking cost. One would first detect
QTL within an IBD-based variance component framework at a low marker density then use a
higher marker density for some QTL of interest to build haplotypes and estimate their effects
within a fixed-effect framework. Such locally high density mapping could allow identifying
the haplotypes of minimum length that have the most promising effect. Besides, it would
directly provide markers to manipulate these haplotypes in breeding schemes.
The methodology developed in this article is currently applied to the analysis of real wheat
breeding data.
Acknowledgments: The authors are grateful to the Ministère de l'Economie, des Finances et
de l'Industrie for its financial support (ASG program n° 01 04 90 6058)
-
31
LITTERATURE CITED
ALMASY,L., and J. BLANGERO, 1998 Multipoint quantitative trait linkage analysis in general
pedigrees. Am. J. Hum. Genet. 62: 1198-1211.
BEAVIS W.D., 1994 The power and deceit of QTL experiments: lessons from comparative
QTL studies. American Seed Trade Association. 49th Annual Corn and Sorghum Research
Conference. Washington D.C.
BERNARDO, R., 1993 Estimation of coefficient of coancestry using molecular markers in
maize. Theor. Appl. Genet. 85: 1055-1062.
BINK, M. C. A. M., P. UIMARI, M. SILLANPÄÄ, L. JANSS, and R. JANSEN, 2002 Multiple QTL
mapping in related plant populations via a pedigree-analysis approach. Theor. Appl. Genet.
104: 751-762.
CHARCOSSET, A., and A. GALLAIS, 1996 Estimation of the contribution of quantitative trait
loci (QTL) to the variance of a quantitative trait by means of genetic markers. Theor. Appl.
Genet. 93: 1193-1201.
COCKERHAM, C. C., 1983 Covariances of relatives from self-fertilization. Crop Science 23:
1177-1180.
DONINI P., J. R. LAW, R. M. D. KOEBNER, J. C. REEVES and R. J. Cooke, 2000 Temporal
trends in the diversity of UK wheat. Theor Appl Genet 100: 912-917.
FALUSH D., M. STEPHENS and J. K. PRITCHARD, 2003 Inference of population structure using
multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567-
1587.
GEORGE A. W., P. M. VISSCHER and C. S. HALEY, 2000 Mapping quantitative trait in complex
-
32
pedigrees: a two-step variance component approach. Genetics 156: 2081-2092.
GILMOUR, A. R., B. R. CULLIS, S. J. WELHAM and R. THOMPSON, 1998 ASREML. Program
User Manual. Ed. Orange Agricultural Institute, New South Wales, Australia.
GIMELFARB, A., and R. LANDE, 1994 Simulation of marker-assisted selection in hybrid
populations. Genet. Res. 63: 39-47.
GIMELFARB, A., and R. LANDE, 1995 Marker-assisted selection and marker-QTL associations
in hybrid populations. Theor. Appl. Genet. 91: 522-528.
HALEY, C.S., and S. KNOTT, 1992 A simple regression method for mapping quantitative trait
loci in line crosses using flanking markers. Heredity 69: 315-324.
HARRIS, D. L., 1964 Genotypic covariances between inbred relatives. Genetics 50: 1319-
1348.
HOSPITAL, F., L. MOREAU, F. LACOUDRE, A. CHARCOSSET and A. GALLAIS, 1997 More on the
efficiency of marker-assisted selection. Theor. Appl. Genet. 95: 1181-1189.
JANSEN, R. C., 1993 Interval mapping of multiple quantitative trait loci. Genetics 135: 205-
211.
JANSEN R. C., J-L. JANNINK and W.D. BEAVIS, 2003 Mapping quantitative trait loci in plant
breeding populations: use of parental haplotype sharing. Crop Sci. 43: 829-834.
JIANG, C., and Z-B ZENG,, 1995 Multiple trait analysis of genetic mapping for quantitative
trait loci. Genetics 140: 1111-1127.
KEMPTHORNE, O., 1955 The correlation between relatives in inbred populations. Genetics 40:
681-691.
-
33
KOROL, A., Y. RONIN, and V. KIRZHNER, 1995 Interval mapping of quantitative trait loci
employing correlated trait complexes. Genetics 140: 1137-1147.
LANDE R., and R. THOMPSON, 1990 Efficiency of marker-assisted selection in the
improvement of quantitative traits. Genetics 124: 743-756.
LANDER, E., and D. BOTSTEIN, 1989 Mapping mendelian factors underlying quantitative traits
using RFLP linkage maps. Genetics 121: 185-199.
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer
Associates, Sunderland, MA.
MALECOT G., 1948 Les mathématiques de l'hérédité, Ed. Masson et Cie, Paris.
MELCHINGER A. E., M. M. MESSMER, M. LEE, W. L. WOODMAN and K. R. LAMKEY, 1991
Diversity and relationships among U.S. maize inbreds revealed by restriction fragment length
polymorphisms. Crop Sci. 31: 669-678.
MOREAU L., S. LEMARIÉ, A. CHARCOSSET, and A. GALLAIS, 2000 Economic efficiency of
one cycle of marker-assisted selection. Crop Sci. 40: 329-337.
MURANTY, H., 1996 Power of tests for quantitative trait loci detection using full-sib families
in different schemes. Heredity 76: 156-165.
NEI M. and W. H. LI, 1979 Mathematical model for studying genetic variations in terms of
restriction endonucleases. Proc. Natl. Acad. Sci. 76: 5369-5373.
PRITCHARD J. K., M. STEPHENS and P. DONNELLY, 2000 Inference of population structure
using multilocus genotype data. Genetics 155: 945-959.
RODER, M. S. , K. WENDEHAKE, V. KORZUN, G. BREDEMEIJER, D. LABORIE, D. et al., 2002
-
34
Construction and analysis of a microsatellite-based database of European wheat varieties.
Theor. Appl. Genet. 106: 67-73.
SERVIN, B., C. DILLMANN, G. DECOUX and F. HOSPITAL, 2002 MDM a program to compute
fully informative genotype frequencies in complex breeding schemes. J. Hered. 93(3): 227-
228.
S-PLUS, 2000 S-PLUS guide to statistical and mathematical analyses. MathSoft,
Massachusetts Institute of Technology
XIE, C., D. D. G. GESSLER and S. XU, 1998 Combining different line crosses for mapping
quantitative trait loci using the identical by descent-based variance component method.
Genetics 149: 1139-1146.
XU, S. 1998 Mapping quantitative trait loci using multiple families of line crosses. Genetics
148: 517-524.
XU, S., and W. R. ATCHLEY, 1995 A random model approach to interval mapping of
quantitative trait loci. Genetics 141: 1189-1197.
YI, N., and S. XU, 2001 Bayesian mapping of quantitative trait loci under complicated mating
designs. Genetics 157: 1759-1771.
ZENG, Z-B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457-1468.
ZENG, Z-B., 1993 Theoritical basis of separation of multiple linked gene effects on mapping
quantitative trait loci. Proc. Natl. Acad. Sci. USA. 90: 10972-10976.
-
35
TABLE 1
Estimates of the position, QTL and total genetic heritabilities; h²QTL and h²g respectively
Experimental design h² g Position ĥ2QTL ĥ2g
(1) QTL heritability and formula
h²QTL=0.05
Formula (1)
Formula (2)
0.427 (0.068) 49.44 (25.42)
46.45 (21.84)
0.067 (0.035)
0.085 (0.041)
0.427 (0.108)
0.432 (0.107)
h²QTL=0.1 Formula (1)
Formula (2)
0.435 (0.051) 47.76 (20.88)
45.52 (17.72)
0.099 (0.042)
0.129 (0.056)
0.442 (0.081)
0.450 (0.083)
h²QTL=0.2 Formula (1)
Formula (2)
0.456 (0.047) 44.91 (7.35)
45.73 (6.89)
0.192 (0.058)
0.228 (0.065)
0.451 (0.083)
0.467 (0.091)
(2) Different experimental designs
Standard setting 45.52 (17.72) 0.129 (0.056) 0.450 (0.083)
AIS 25% 47.15 (17.99) 0.135 (0.060) 0.445 (0.084)
NA 10%
0.435 (0.051)
48.55 (18.87) 0.131 (0.098) 0.434 (0.094)
NP=40 0.451 (0.067) 48.42 (13.75) 0.136 (0.053) 0.446 (0.108)
NG=20 formula (1) 0.413 (0.059) 48.00 (18.75) 0.091 (0.039) 0.430 (0.091)
NG=20 formula (2) 0.413 (0.059) 46.89 (15.96) 0.145 (0.049) 0.436 (0.094)
d=20cM
d=20, AIS 25%
0.433 (0.052)
0.433 (0.052)
52.27 (23.10)
52.87 (24.80)
0.149 (0.070)
0.147 (0.076)
0.395 (0.088)
0.399 (0.087)
Npar=50, NF6=250 0.403 (0.077) 49.86 (23.25) 0.159 (0.070) 0.441 (0.149)
The standard setting is a QTL heritability of 0.1, a total genetic heritability (h2g) of 0.435
(with standard deviation 0.051 among the 100 repetitions) and a 10 cM marker density, with a
QTL at 45 cM. About 500 individuals are created at the F6 stage coming from 100 parents for
each breeding cycle. h2g is the result of the readjustment of the residuals in order to obtained
a given target h2QTL. The studied generation comes from the 10th cycle of breeding, having
considered 20 founder lines at the beginning of selection.
-
36
(1) Each set of runs differs from the simulated QTL heritability noted in column one, and
by the IBD formula tested. IBD formula (1) takes only into account Half-Sib and Full-
Sib relationships while formula (2) adds ancestor relationships estimated by markers.
(2) Each additional set of runs differ from the standard setting by the simulation
parameter noted in column one. Except the NG=20 case, all the other settings are
tested with formula (2) only.
Mean and standard deviations (in parentheses) are calculated among the 100 replicates.
-
37
TABLE 2
LR threshold, test statistic and QTL detection power
Threshold Test statistic Power (%)
(1) QTL heritability and formula
h²QTL=0,05 formula (1) 4.08 4.62 (3.52) 47
h²QTL=0,05 formula (2) 3.96 5.23 (3.97) 58
h²QTL=0,1 formula (1) 4.08 7.75 (5.06) 71
h²QTL=0,1 formula (2) 3.96 9.76 (6.71) 80
h²QTL=0,2 formula (1) 4.08 22.03 (10.9) 100
h²QTL=0,2 formula (2) 3.96 23.69 (10.8) 100
(2) Different experimental designs
Standard setting 3.96 9.76 (6.71) 80
AIS 25% 4.28 9.91 (6.56) 72
NA 10% 4.30 7.81 (5.53) 66
NP=40 3.70 11.29 (7.09) 89
NG=20 formula (1) 3.72 5.95 (4.15) 65
NG=20 formula (2) 3.62 10.62 (5.75) 91
d=20 4.25 9.21 (6.81) 75
d=20, AIS 25% 4.94 8.06 (6.50) 64
Npar=50, NF6=250 4.00 7.96 (3.66) 62
See Table 1 for the standard setting. Each additional set of runs differs from the standard
setting by the parameter change noted in column one. Threshold represents the empirical
threshold calculated for 1000 replicates. Test statistic is the mean and standard deviation of
the maximum of LR test for the 100 replicates. Power is the percentage of replicates with max
LR exceeding the empirical threshold.
-
38
TABLE 3
Accuracy of three methods used to infer Confidence Intervals (CI)
CI- 4 S.D. % 4 S.D. CI- 1 LDO % 1 LDO CI- 2 LDO % 2 LDO
(1) QTL heritabilities and IBD formula
h²QTL=0,05 formula (1) 86.40 91 41.78 79 86.66 97
h²QTL=0,05 formula (2) 72.20 93 41.18 79 85.43 97
h²QTL=0,1 formula (1) 63.28 90 29.10 82 74.75 99
h²QTL=0,1 formula (2) 63.40 93 31.04 86 66.62 96
h²QTL=0,2 formula (1) 29.40 96 17.26 90 34.33 97
h²QTL=0,2 formula (2) 26.00 98 15.21 91 28.21 97
(2) Different experimental designs
Standard Setting 63.40 93 31.04 86 66.62 96
AIS 25% 63.84 92 26.4 80 62.9 97
NA 10% 63.24 91 28.48 80 67.6 95
NP=40 52.24 94 26.66 85 60.34 97
NG=20, formula (1) 67.60 91 43.95 72 82.80 96
NG=20, formula (2) 62.20 91 27.30 76 66.44 97
D=20 93.60 96 36.35 75 73.96 94
D=20, AIS 25% 94.56 95 35.08 67 75.61 91
Npar=50, NF6=250 88.16 89 45.61 79 82.82 95
See Table 1 for the standard setting. “CI- 4 S.D.” represents the size of the CI calculated as
four times the standard deviation on the estimated QTL position for the significant runs, and
“% 4 S.D.” represents the percentage of times that the true position is included within the
delimited interval. “CI- 1 LDO” and “CI- 2 LDO” represent the size of the CI as determined
by the drop of one and two LOD unit (multiply by 2*ln(10) to convert in LR units)
-
39
respectively. “% 1 LDO” and “% 2 LDO” represent the number of times that the true position
is included within the delimited interval.
-
40
TABLE 4
Estimates of the QTL parameters for different levels of QTL heritability under the
empirical threshold
Experimental design h² g Position ĥ2QTL ĥ2g
QTL heritability
h²QTL=0.05
Formula (1)
Formula (2)
0.427 (0.068) 47.55 (21.61)
45.98 (18.05)
0.092 (0.032)
0.110 (0.030)
0.429 (0.110)
0.437 (0.109)
h²QTL=0.1 Formula (1)
Formula (2)
0.435 (0.051) 49.56 (15.82)
45.56 (15.85)
0.117 (0.035)
0.143 (0.051)
0.452 (0.081)
0.450 (0.08)
h²QTL=0.2 Formula (1)
Formula (2)
0.456 (0.047) 44.91 (7.35)
45.49 (6.50)
0.192 (0.058)
0.230 (0.062)
0.451 (0.083)
0.471 (0.081)
Each set of runs differs by the simulated QTL heritability noted in column one, and by the
IBD formula tested. Mean and standard deviations (in parentheses) are reported only for the
significant replicates (noted in the “Power” column of Table 2) among 100.
-
41
TABLE 5
Estimates of the QTL and total genetic heritabilities at the true QTL position (45)
h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435 h²QTL =0.2, h2g=0.456
Formula ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g
(1)
(2)
0.036 (0.037)
0.051 (0.047)
0.426 (0.106)
0.430 (0.108)
0.068 (0.048)
0.094 (0.064)
0.442 (0.080)
0.447 (0.080)
0.191 (0.056)
0.225 (0.066)
0.455 (0.081)
0.471 (0.094)
Formula (1) and (2) are tested for three levels of QTL heritability at the true QTL position.
h2g is the result of the readjustment of the residuals in order to obtained a given target h2QTL.
-
42
TABLE 6
Averaged Confidence Interval Heritability for the significant replicates over the 2 Lod-
Drop-off units interval
heritabilities h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435 h²QTL =0.2, h2g=0.456
No-Selection ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g
(1)
(2)
0.049 (0.028)
0.054 (0.034)
0.405 (0.104)
0.418 (0.109)
0.072 (0.039)
0.100 (0.063)
0.434 (0.083)
0.438 (0.081)
0.158 (0.062)
0.194 (0.067)
0.438 (0.085)
0.459 (0.082)
Formula (1) and (2) are tested for three levels of QTL heritability.
h2g is the result of the readjustment of the residuals in order to obtained a given target h2QTL.
-
FIGURE 1
1 2 3 NP Q1 Q2 Q3 QNP 1 2 3 NP
P1 P2 P3 P4 (…) P99 P100
matrix of crosses
* * * *
P1 P2 P79 P80
Founder lines: full linkage disequilibrium marker-QTL
G0 generation
For i=10 breeding cycles
Gi generation 500 F6 lines
QTL Detection
1 to 4 progenies per cross + half-sib relationships
circular crosses
G10 generation
+ P81-P100 : overlapping
NP sub-populations
Creation of the QTL and polygene effects
-
44
0
10
20
30
40
50
60
70
80
P1 P6 P16
P26
P36
P46
P56
P66
P76
P86
P96
Parent name
Num
ber o
f pro
geni
es /
pare
nt
MeanStandard Deviation
FIGURE 2
-
45
FIGURE 3
-
46
FIGURE LEGENDS: FIGURE 1: Simulation of the breeding scheme
FIGURE 2: Mean and standard error for 1000 replicates created by the “matrix of crosses”
function for the half-sib family size under the standard setting (100 parents, 500 F6).
FIGURE 3: Comparison of the LR profiles for (a) different levels of QTL heritabilities for
formula (1) and (2), (b) different biased marker information for the standard setting under
formula (2), (c) difference in the breeding schemes compared to the standard setting, and (d) a
density of one marker every 20 cM, with 25% of AIS non-IBD alleles.