Www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif.
-
Upload
jordan-lewis -
Category
Documents
-
view
218 -
download
0
Transcript of Www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif.
www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif
Bayesian Hierarchical Model Bayesian Hierarchical Model for QTLsfor QTLs
Susan Simmons
University of North Carolina Wilmington
CollaboratorsCollaborators
Dr. Edward BooneDr. Edward BooneDr. Ann StapletonDr. Ann StapletonMr. Haikun BaoMr. Haikun Bao
DNADNA
ChromosomeChromosome
GenesGenes
Genetic MapGenetic Map
Chromosome 1 of ProtozoaChromosome 1 of ProtozoaCryptosporidium Cryptosporidium parvumparvum
Chromosome 1 of Homo Chromosome 1 of Homo sapienssapiens
AllelesAlleles
Genetic MapsGenetic Maps
Many more maps available at www.ncbi.nih.gov
Knowing information about genes now allows us to find associations between genes and outcomes (phenotypes)
Some examplesSome examples
In 1989 a breakthrough was made for the disease of cystic fibrosis.
Location (or locus) is 7q31.2 - The CFTR gene is found in region q31.2 on the long (q) arm of human chromosome 7 (single gene responsible for this disease).
The disease arises when an individual has two recessive copies at this location.
An individual with one dominant and one recessive is said to be a carrier of the disease.
Genetic screening to determine disease.
Green revolutionGreen revolution
The Green Revolution is the increase in food production stemming from the improved strains of wheat, rice, maize and other cereals in the 1960s developed by Dr Norman Borlaug in Mexico and others under the sponsorship of the Rockefeller Foundation
Created new species of wheat and rice that produced higher yield.
QTLQTL
Better medical treatments and increased agriculture are only two examples in which identifying the location on the genome can have an impact.
Identifying the region on the genome (or on the chromosome) responsible for a quantitative trait (as opposed to qualitative as disease) is known as Quantitative Trait Locus (QTL).
Existing softwareExisting software
Zhao-Bang Zeng’s group at NC State has QTL Cartographer
Karl Broman (John Hopkins) has an R program that performs a number of algorithms for QTLs
To use these algorithms (and a number of other published algorithms) only one observation per genotype can be used
World of plantsWorld of plants
Why plants?Why plants?
Increase yield to feed our increasing population
Make plants resistant to UV-B exposure
Plants, continuedPlants, continued
Control– Design and Environment– Reproduction– Design (RIL is one of the best designs for
detecting QTLs)… Alleles are homozygous
CostTime
Plant QTL experimentsPlant QTL experiments
In most experiments, a number of replicates or clones are observed within each line
A number of plant biologist use some summary measure to use conventional methods
Information is lost (and can be misleading…example in Conte et al (unpublished))
Hierarchical model to incorporate replicates within each line
DataData
Trait or phenotype, yij , i = 1,..,L where L is the number of lines and j = 1, …, ni (number of replicates within each line)
Design matrix, X is L x M where M is the number of markers on the genetic map
Hierarchical ModelHierarchical Model
Hierarchical Model
yij ~ N(i,i2)
i ~ N(XiT, 2)
Priors
2 ~ Inverse 2 (1)
k ~ N(0,100)
i2 ~ Inverse 2 (1)
Posterior Model ProbabilityPosterior Model Probability
Let denote the set of all possible models. Given data D, the posterior probability of model ki is given by Bayes Rule
(These probabilities are implicitly conditioned on the set )
1
( | ) ( )( | )
( | ) ( )
i ii
i ij
P D k P kP k D
P D k P k
Posterior Model continuedPosterior Model continued
To compute probability of the model given the data in previous slide ( ), we need to compute P(D|ki), where
i is the vector of unknown parameters for model ki
( | ) ( | , ) ( | )i i i i i iP D k P D k P k d
( | )iP k D
IntegrationIntegration
This integration can become difficult since the length of the unknown parameters is 2*L + M +2. Use Monte Carlo estimate of the integral
Where , j = 1,…,t are samples from the posterior distribution
( )
1
1( | , ) ( | ) ( | , )
tj
i i i i i i ij
P D k P k d P D kt
( )ji
Search strategySearch strategy
The activation probability, P(j 0|D) is defined as
There are 2M number of potential models,which can make the calculation of P(j 0|D) computationally intensive
Instead, we define a conditional probability search approach
( 0 | ) ( 0 | , ) ( | )j j i iP D P k D P k D
C211
C2 C3 C4 C5C1
C212
C4212C4211
C422
C41
C421
C42C21 C22
Simulated dataSimulated data
Using the line information from the Bay x Sha RIL population, a single QTL was simulated on the fourth marker of the first chromosome.
The Bay x Sha population has 5 chromosomes.
C111
0.818
C2
0.4
C3
0.6
C4
0.4
C5
0.0029
C1
1
C112
0.927
C1112
0.014(M2)
C1111
0.041 (M1)
C122
0.108
C31
0.063
C121
0.114
C32
0.063
C11
1
C12
0.9362
C1121
0.083(M3)
C1122
1(M4)
CommentsComments
Need to run model on more simulationsWould like to compare this search strategy
to a stochastic searchWould like to include epistasis in the model
Thank youThank you