Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics,...

37
Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH

Transcript of Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics,...

Page 1: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Introduction to Genetics

Debashis GhoshProfessor and Chair,

Biostatistics and Informatics, ColoradoSPH

Page 2: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Question we tackle today

• What do we mean by a gene?• Steve Mount (ongenetics.blogspot.com): “A gene is all of the DNA elements

required in cis for the properly regulated production of a set of RNAs whose sequences overlap in the genome.  ”

• Mark Gerstein (2007, Genome Biology): “The gene is a union of genomic sequences

encoding a coherent set of potentially overlapping functional products”

Page 3: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

What is a gene?

• No ``one-size-fits-all” definition• The previous definitions are useful to

contextualize data that are generated from experiments

• Thinking carefully about evolution and the constraints it has placed on functions is also important

Page 4: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

From Genotype to Phenotype

• Full genotypes (genomes) are coming…But inheritance is complex

• Genetic markers are characters inherited in a way that is simple enough to easily track

• Want to find genetic markers that explain or predict phenotypes– e.g., disease, susceptibility– Ideally, the marker would be causative

• But that is rare

Page 5: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Alleles as Genes

• At each gene locus, we have two alleles, one transmitted to us by our father, and one by our mother.

• Usual assumption: Each parent randomly transmits one of his/her alleles to the child

• For real datasets, this is identical to DNA variants referred to as single-nucleotide polymorphisms (SNPs)

Page 6: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Diploid Inheritance

From Mom

From Dad

From Mom

From Dad

Heterozygote

Homozygote

Page 7: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Phenotypic Dominance

From Mom

From Dad

Heterozygote

Light blue dominantDark blue recessive

Dark blue dominantLight blue recessive

Mixed Dominance

Page 8: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Diploid Inheritance

Heterozygote

Homozygote

Dark BlueIs Dominant

Recessive Phenotype

Only Visible in Homozygote

Page 9: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Mendelian Ratios

Page 10: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Recombination

From Grandma

From Grandpa

Chromosomal Segment in Mom (she’s a diploid,

remember)

From Mom

From Dad

Chromosomal Segment in You (You’re diploid too)

Page 11: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Crossing Over

From Grandma

From Grandpa

Sister Chromatids Recombine (Cross Over) During Meiosis

Inherited by You

Lost (Except inTetrad Analysis)

Products of Meiosis

Page 12: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Recombination: Basic Points

• Recombination switches which chromosome in the parent (i.e., originating from which grandparent) is passed along to the offspring

• Alleles physically adjacent on a chromosome are more likely to be passed on together than alleles far apart

• Alleles very far apart or on different chromosomes are inherited randomly

Page 13: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Finding Disease Genes

• Assemble data set of probands• Assemble data set of control population• Might have pedigree if runs in families• Might have trios to determine linkage– Proband plus two parents

• Look for linkage between genetic markers and disease– In pedigree– In dataset of less related individuals

Page 14: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Genetic Markers• Polymorphic in population– Different variants in different individuals– Single Nucleotide Polymorphism (SNP)– Variable Number of Tandem Repeats

(VNTR)• minisatellites

– Short Tandem Repeats (STR)• Microsatellites• Very high mutation rate: strand slippage

• Haplotype– A set of closely linked SNPs inherited as

unit

Page 15: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Linkage Analysis

• Set of variable markers distributed throughout genome

• Identify linkage regions (haplotypes) that cosegregate (are inherited) with disease or trait

Page 16: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Pedigree Analysis• Tabulate the occurrence of a trait in

an extended family– Pedigree is family’s mating history

Page 17: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Assumptions and Complications

• Single gene with Mendelian inheritance– Best use of extended families– Few extended families with trait

• Quantitative traits are multigenic– Includes most widespread or “common”

inherited diseases– Sib pairs are best for complex traits with

incomplete penetrance (see next slide)

Page 18: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Incomplete Penetrance

• Not everyone with genotype will have the disease– Delayed or adult onset– Mild or undetectable symptoms– Environmental and developmental factors– Unknown genetic factors

• Disease allele = increase probability of disease, relative risk

• We don’t always know in pedigree who has the disease genotype!

Page 19: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Evaluating Linkage

• Remember, individual is a recombinant with respect to two genes, A and B, if inherits the allele from one parental chromatid at A and inherits the allele from the other parental chromatid at B

• The recombination fraction is the probability that a child is recombinant

• If A and B are tightly linked, then is small

Page 20: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Simple LOD Scores

• Total number of offspring, P• Number of recombinant offspring, R• Likelihood of the Data = • Maximum likelihood estimate

• LOD score for linkage in pedigree is

Page 21: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Complications• Need to know phase, genotypes of

parents, to identify recombinants– Can estimate informativeness of additional

data depending on heterozygosity of markers

• Many disease versus marker comparisons are involved– Multiple comparisons– But, markers are not independent

• Population structure• LOD scores > 3 (1000:1) give general

sense; >5 very strong

Page 22: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Population structure

• Genetic markers have different patterns in different populations; this has the possibility of confounding associations between genetic markers with disease phenotypes.

Page 23: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Realistic Complications

• Include Penetrance(X|G)– Likelihood of observing trait X given the

genotype G

• Prior(G)– Likelihood of observing the genotype in an

individual

• Transmit(Gm|Gk,Gl, )– Probability that offpring will have genotype

Gm given parental genotypes Gk and Gl, and the recombination parameter

Page 24: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

LOD Graph

•Can look at LOD score over a range of 's, not just MLE.

•Usual assumption is LOD > 3 is evidence for linkage, LOD < -2 is evidence for exclusion

Example: 27 recombinantsOut of 139 gametes(example from S. Purcell)

Page 25: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Recombination Probability and Distance along Chromosome

• Recombination does not increase linearly–Multiple recombination events possible

over greater distances, but also interference

• Can estimate genetic distance from recombination rates–Measure in Morgans, or cM– the expected number of crossovers,

is additive

Page 26: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Mapping Functions

• Haldane’s mapping function– Crossovers are assumed random and

independent

• Kosambi’s mapping function–Models interference: crossovers not too

close–Most popular

Page 27: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Genetic versus Physical

• Mapping is not simple– Recombination rate varies along

chromosomes

• Male versus Female–Men 28.51M over whole genome• 1.05 Mb/cM

–Women 42.96M (excluding X)• 0.88 Mb/cM

• In Drosophila, about 0.4 Mb/cM

Page 28: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Modeling Penetrance• Single locus, three genotypes

• If – Disease is Mendelian dominant

• If – Disease is Mendelian recessive

• Spontaneous mutations:• incomplete penetrance:

Page 29: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Extending Analysis

• SNPs scattered throughout genome– LOD scores for regions, not individual marker

• Multipoint linkage analysis– Establish order relationship among 3+ markers

• Non-parametric analysis can be better for complex traits, incomplete penetrance–Work with affected siblings– Less statistical power than model-based

methods

• Identical by descent (IBD) versus chance

Page 30: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Non-Parametric

• Concerning siblings or other relatives– Need “both affected” and “only one

affected” pairs

• Correlate shared IBD alleles with affected state, proportion in two classes– High correlation means linkage to

disease MentionT1D

Page 31: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

(Genomewide) Association Studies

• Correlate markers with disease over a large population

• Marker may be disease (rare)• Large regions of chromosome in

linkage disequilibrium with disease allele–Marker is in disease gene haplotype

• Regions of chromosome tend to be inherited as a unit– Tapers off over time due to recombination

Page 32: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Association Studies• Linkage disequilibrium varies among

populations– Depends on population structure, age

• coalescent

– Europeans have a lot, African populations only a little

– Population of human origin is more diverse, older

• Need dense, cheap markers over genome: Genome Wide Association Studies (GWAS)

Page 33: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

QTL and GWAS• Quantitative Traits, polygenic traits that

are assumed to have additive effects– Height, heart disease– Quixotic Trait Loci?

• Each gene has a small effect• Huge genotyping efforts now paying off• BUT only a small fraction of genetic

component is accounted for even in huge studies– Tradeoffs of including broader human

population

Page 34: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Common Disease versus Rare Variants

• Common disease, common variants: The most frequently occurring alleles/SNPs should explain most of the etiology of a disease.- Current studies do NOT show this to be the case.

• Newer paradigm: rare variants• - occur less frequently but have

larger associations with disease

Page 35: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Sullivan, Daly and Donovan, Nature Reviews Genetics, 2012

Page 36: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

• Different results in different populations

• Heritability–What makes a gene matter to a

disease?– Take advantage of human phenotyping–What genes CAN contribute to disease

or modification of disease?

• A golden age of personal genomics?

Page 37: Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH.

Acknowledgments

• David Pollock, Biochemistry and Molecular Genetics