Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics,...
-
Upload
emily-smith -
Category
Documents
-
view
221 -
download
2
Transcript of Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics,...
Introduction to Genetics
Debashis GhoshProfessor and Chair,
Biostatistics and Informatics, ColoradoSPH
Question we tackle today
• What do we mean by a gene?• Steve Mount (ongenetics.blogspot.com): “A gene is all of the DNA elements
required in cis for the properly regulated production of a set of RNAs whose sequences overlap in the genome. ”
• Mark Gerstein (2007, Genome Biology): “The gene is a union of genomic sequences
encoding a coherent set of potentially overlapping functional products”
What is a gene?
• No ``one-size-fits-all” definition• The previous definitions are useful to
contextualize data that are generated from experiments
• Thinking carefully about evolution and the constraints it has placed on functions is also important
From Genotype to Phenotype
• Full genotypes (genomes) are coming…But inheritance is complex
• Genetic markers are characters inherited in a way that is simple enough to easily track
• Want to find genetic markers that explain or predict phenotypes– e.g., disease, susceptibility– Ideally, the marker would be causative
• But that is rare
Alleles as Genes
• At each gene locus, we have two alleles, one transmitted to us by our father, and one by our mother.
• Usual assumption: Each parent randomly transmits one of his/her alleles to the child
• For real datasets, this is identical to DNA variants referred to as single-nucleotide polymorphisms (SNPs)
Diploid Inheritance
From Mom
From Dad
From Mom
From Dad
Heterozygote
Homozygote
Phenotypic Dominance
From Mom
From Dad
Heterozygote
Light blue dominantDark blue recessive
Dark blue dominantLight blue recessive
Mixed Dominance
Diploid Inheritance
Heterozygote
Homozygote
Dark BlueIs Dominant
Recessive Phenotype
Only Visible in Homozygote
Mendelian Ratios
Recombination
From Grandma
From Grandpa
Chromosomal Segment in Mom (she’s a diploid,
remember)
From Mom
From Dad
Chromosomal Segment in You (You’re diploid too)
Crossing Over
From Grandma
From Grandpa
Sister Chromatids Recombine (Cross Over) During Meiosis
Inherited by You
Lost (Except inTetrad Analysis)
Products of Meiosis
Recombination: Basic Points
• Recombination switches which chromosome in the parent (i.e., originating from which grandparent) is passed along to the offspring
• Alleles physically adjacent on a chromosome are more likely to be passed on together than alleles far apart
• Alleles very far apart or on different chromosomes are inherited randomly
Finding Disease Genes
• Assemble data set of probands• Assemble data set of control population• Might have pedigree if runs in families• Might have trios to determine linkage– Proband plus two parents
• Look for linkage between genetic markers and disease– In pedigree– In dataset of less related individuals
Genetic Markers• Polymorphic in population– Different variants in different individuals– Single Nucleotide Polymorphism (SNP)– Variable Number of Tandem Repeats
(VNTR)• minisatellites
– Short Tandem Repeats (STR)• Microsatellites• Very high mutation rate: strand slippage
• Haplotype– A set of closely linked SNPs inherited as
unit
Linkage Analysis
• Set of variable markers distributed throughout genome
• Identify linkage regions (haplotypes) that cosegregate (are inherited) with disease or trait
Pedigree Analysis• Tabulate the occurrence of a trait in
an extended family– Pedigree is family’s mating history
Assumptions and Complications
• Single gene with Mendelian inheritance– Best use of extended families– Few extended families with trait
• Quantitative traits are multigenic– Includes most widespread or “common”
inherited diseases– Sib pairs are best for complex traits with
incomplete penetrance (see next slide)
Incomplete Penetrance
• Not everyone with genotype will have the disease– Delayed or adult onset– Mild or undetectable symptoms– Environmental and developmental factors– Unknown genetic factors
• Disease allele = increase probability of disease, relative risk
• We don’t always know in pedigree who has the disease genotype!
Evaluating Linkage
• Remember, individual is a recombinant with respect to two genes, A and B, if inherits the allele from one parental chromatid at A and inherits the allele from the other parental chromatid at B
• The recombination fraction is the probability that a child is recombinant
• If A and B are tightly linked, then is small
Simple LOD Scores
• Total number of offspring, P• Number of recombinant offspring, R• Likelihood of the Data = • Maximum likelihood estimate
• LOD score for linkage in pedigree is
Complications• Need to know phase, genotypes of
parents, to identify recombinants– Can estimate informativeness of additional
data depending on heterozygosity of markers
• Many disease versus marker comparisons are involved– Multiple comparisons– But, markers are not independent
• Population structure• LOD scores > 3 (1000:1) give general
sense; >5 very strong
Population structure
• Genetic markers have different patterns in different populations; this has the possibility of confounding associations between genetic markers with disease phenotypes.
Realistic Complications
• Include Penetrance(X|G)– Likelihood of observing trait X given the
genotype G
• Prior(G)– Likelihood of observing the genotype in an
individual
• Transmit(Gm|Gk,Gl, )– Probability that offpring will have genotype
Gm given parental genotypes Gk and Gl, and the recombination parameter
LOD Graph
•Can look at LOD score over a range of 's, not just MLE.
•Usual assumption is LOD > 3 is evidence for linkage, LOD < -2 is evidence for exclusion
Example: 27 recombinantsOut of 139 gametes(example from S. Purcell)
Recombination Probability and Distance along Chromosome
• Recombination does not increase linearly–Multiple recombination events possible
over greater distances, but also interference
• Can estimate genetic distance from recombination rates–Measure in Morgans, or cM– the expected number of crossovers,
is additive
Mapping Functions
• Haldane’s mapping function– Crossovers are assumed random and
independent
• Kosambi’s mapping function–Models interference: crossovers not too
close–Most popular
Genetic versus Physical
• Mapping is not simple– Recombination rate varies along
chromosomes
• Male versus Female–Men 28.51M over whole genome• 1.05 Mb/cM
–Women 42.96M (excluding X)• 0.88 Mb/cM
• In Drosophila, about 0.4 Mb/cM
Modeling Penetrance• Single locus, three genotypes
• If – Disease is Mendelian dominant
• If – Disease is Mendelian recessive
• Spontaneous mutations:• incomplete penetrance:
Extending Analysis
• SNPs scattered throughout genome– LOD scores for regions, not individual marker
• Multipoint linkage analysis– Establish order relationship among 3+ markers
• Non-parametric analysis can be better for complex traits, incomplete penetrance–Work with affected siblings– Less statistical power than model-based
methods
• Identical by descent (IBD) versus chance
Non-Parametric
• Concerning siblings or other relatives– Need “both affected” and “only one
affected” pairs
• Correlate shared IBD alleles with affected state, proportion in two classes– High correlation means linkage to
disease MentionT1D
(Genomewide) Association Studies
• Correlate markers with disease over a large population
• Marker may be disease (rare)• Large regions of chromosome in
linkage disequilibrium with disease allele–Marker is in disease gene haplotype
• Regions of chromosome tend to be inherited as a unit– Tapers off over time due to recombination
Association Studies• Linkage disequilibrium varies among
populations– Depends on population structure, age
• coalescent
– Europeans have a lot, African populations only a little
– Population of human origin is more diverse, older
• Need dense, cheap markers over genome: Genome Wide Association Studies (GWAS)
QTL and GWAS• Quantitative Traits, polygenic traits that
are assumed to have additive effects– Height, heart disease– Quixotic Trait Loci?
• Each gene has a small effect• Huge genotyping efforts now paying off• BUT only a small fraction of genetic
component is accounted for even in huge studies– Tradeoffs of including broader human
population
Common Disease versus Rare Variants
• Common disease, common variants: The most frequently occurring alleles/SNPs should explain most of the etiology of a disease.- Current studies do NOT show this to be the case.
• Newer paradigm: rare variants• - occur less frequently but have
larger associations with disease
Sullivan, Daly and Donovan, Nature Reviews Genetics, 2012
• Different results in different populations
• Heritability–What makes a gene matter to a
disease?– Take advantage of human phenotyping–What genes CAN contribute to disease
or modification of disease?
• A golden age of personal genomics?
Acknowledgments
• David Pollock, Biochemistry and Molecular Genetics