Mining your Personal Genome
description
Transcript of Mining your Personal Genome
![Page 1: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/1.jpg)
Jieming Chen Yale University
CBB752a12
Mining yourPersonal Genome
![Page 2: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/2.jpg)
What is Personal Genomics?
• Personal genomics is the branch of genomics concerned with the sequencing and analysis of the genome of an individual -- Wikipedia
• Is it not possible before?- Genetics VS genomics- Post-Human-Genome-Project (HGP) genomics
![Page 3: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/3.jpg)
2000 2003 2006 2008 2010
Nature (2010)
![Page 4: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/4.jpg)
Personal Genomics
1. From basic research, to clinic, then to the masses
2. Tools to mine your own genome
3. Ethics and Privacy
![Page 5: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/5.jpg)
GENOMICS IN BASIC RESEARCHIncreasingly “personalized” genomics…
![Page 6: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/6.jpg)
Before mass sequencing- mass genotyping -
• Genotyping- Determination of the
genotypes of parts (usually genetic variations) of an individual’s genome using biological assays
• SNP arrays Hybrid arrays- SNP (single nucleotide
polymorphisms)genotyping
- Main players: Affymetrix VS Illumina
Affymetrix: http://www.affymetrix.com/Illumina : http://www.illumina.com/
![Page 7: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/7.jpg)
Before mass sequencing- mass genotyping -
• Array CGH (comparative genomic hybridization)
- CNV (copy number variation) genotyping- Main players: Agilent VS Nimblegen- Main application:
detection of genomic abnormalities in cancer detection of large structural aberrations (especially at the chromosomal level)
![Page 8: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/8.jpg)
SNP arrays• Affymetrix• Illumina
• Probes on microarray technology
1K10KXba
50KHindXba
100K250KNspSty
500K SNP 5.0 SNP 6.0 Axiom
100K 240K 300K 550K 610K 650K 1M Omni
Affymetrix Axiom Solutionshttp://www.affymetrix.com/
![Page 9: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/9.jpg)
SNP selection in array design1) SNP quantity
- limited by microarray technology2) SNP content
- random probes or probes for ‘tag’ SNPs- random probes are produced by specific enzymes in some array technology- ‘tag’ SNPs is one that represents a group of SNPs in a genomic region due to a phenomenon called, linkage disequilibrium (LD).- LD refers to the non-random association of alleles at 2 or more loci.- Haplotypes refers to a certain configuration of alleles that are transmitted together (or assumed to be).- One can, in theory, predict the larger group of SNPs with a smaller set of SNPs
![Page 10: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/10.jpg)
Linkage DisequilibriumA Ba b
AB
ab
High LD -> No Recombination(r2 = 1) SNP1 “tags” SNP2
A BA B
A Ba b
a ba b
Low LD -> RecombinationMany possibilities
A bA ba Ba b
A B A B
a BA b
etc…
A bA B
X
OR
Parent 1 Parent 2
ASHG 2008 Hapmap Tutorial: http://hapmap.ncbi.nlm.nih.gov/tutorials.html.en
![Page 11: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/11.jpg)
The International HapMap Project• Largely exploited the idea of haplotypes and
LD - reduce cost (sequencing is expensive)- capitalize on microarray technology
• Involved Illumina, Affymetrix,>20 institutions worldwide
• HapMap1 (2003) and Hapmap2 (2005)- 4 populations (270 indiv): CEU (NW European from Utah), CHB (Han Chinese from Beijing), JPT (Japanese from Tokyo), YRI (Yoruban from Nigeria)
• Hapmap3 (2010) - 11 populations (4+7, 1301 indiv)
![Page 12: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/12.jpg)
The International HapMap Project• Provided the foundation
for future human genomic projects:- maturation of the microarray technology - tool development from industry and academia- the use of common variations in disease studies and genome-wide association studies (GWAS)- population-specific genetic differences- samples - consent and ethical issues
• Major limitations: 1) coverage (the entire genome is not covered)2) rare variants are unlikely to be uncovered3) population-based genome-wide studies
www.hapmap.org
![Page 13: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/13.jpg)
Even with limited information, genomics is getting “personalized”…
Basic• Human reference genome refinement• Human evolution and natural selection• Comparative genomics
Ancestry of individuals• Population structure• Human migration route• Haplotyping • Linguistics
Clinical applications• Pharmacogenetics/genomics• Disease associations
ETC. ETC. ETC……HUGO PASNP Consortium (2009), Science
A C T G
![Page 14: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/14.jpg)
Heralding the personal genomes
• HapMapP3 draft 1 came out in 2009 and paper published in 2010
• Venter genome (2007) and Watson genome (2008)
• Faster, cheaper and more accurate sequencing technologies Transitioning into personal genomes
• 2009-2011, 1000 Genomes Project sequenced 1092 genomes from 14 different populations
![Page 15: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/15.jpg)
2007
2008
2008
2008
2009
2009
2009
200920092009
2010
![Page 16: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/16.jpg)
Further into the personal genome• Beyond simply sequencing the personal genome• If a family trio is sequenced (mum, dad, child), one can
potentially phase the variations of the child into its maternal and paternal alleles.*
• Phasing refers to the determination of the haplotype of an individual’s sequence.
• It can be done experimentally (not feasible for large-scale phasing) or computationally.
• Typical computational phasing algorithms include the use of HMM (e.g. BEAGLE, Browning & Browning 2007, AJHG) and EM (e.g. fastPHASE, Scheet & Stephens 2006, AJHG).
*Note that phasing can also be done with unrelated individuals but you won’t know the maternal or paternal chromosomes
![Page 17: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/17.jpg)
Phasing
Parent 1 Parent 2 Child Informative to phase child’s genome?
Homozygous Homozygous Any Yes
Homozygous Heterozygous Any Yes
Heterozygous Homozygous Any Yes
Heterozygous Heterozygous Homozygous Yes
Heterozygous Heterozygous Heterozygous No
ABcD
aBcd
Father Mother
ABCd
aBcd
Child
Bc
B
d dC
Simple example of phased sequence of the child (as opposed to ‘unphased’, highlighted black)
A a
Adapted from: http://www.chromosomechronicles.com/2009/09/30/use-family-snp-data-to-phase-your-own-genome/
![Page 18: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/18.jpg)
Allele-specific binding (ASB) and expression (ASE)
Possible causes for ASB/ASE1) Epigenetic effects, e.g. imprinting, where methylation silences a maternal/paternal
gene2) Genetic variations (such as SNPs) disrupting a binding motif or modifying a gene on a
single parental haplotype3) Random mono-allelic expression/binding
Clinical examples1) Angelman Syndrome – maternal gene(s) on chromosome 15 inactivated or deleted,
paternal gene imprinted2) Prader-Willi Syndrome – paternal gene(s) on chromosome 15 inactivated or
deleted, maternal gene imprinted
Using a phased genome to study ASB and ASE• Integrate phased sequence with ChIP-seq (binding) and RNA-seq (expression) data to
obtain allele-specific information in binding and expression (Rozowsky J et. al. 2011)
![Page 19: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/19.jpg)
PERSONAL GENOMICS IN CLINICAL RESEARCH
“Personalization in progress… Watch this space”
![Page 20: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/20.jpg)
Personal genomics in ClinicSome areas that clinicians are interested in that genomics can potentially improve:• Disease prediction• Pharmocogenetics/genomics• Response to therapy• Patient care (personalized
environmental and epigenetic information, patient data privacy etc. etc.)
• Personalized medicine and healthcare
Examples of some genomic technologies in clinical research1) Genome-Wide Association Studies2) Exome sequencing3) Pharmacogenetics/genomics4) Gene expression profiles via RNA-seq
McCarthy et. al. 2008
![Page 21: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/21.jpg)
Genome-Wide Association Studies (GWAS)
• First successful GWAS was done at Yale, in 2005 for age-related macular degeneration (AMD) (Klein R. et. al. 2005, Science) - 96 cases, 50 controls, 116K SNPs
Klein R. et. al. 2005
![Page 22: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/22.jpg)
GWAS• Perpetuated by HapMap and microarray technology• Hypothesis-free• Main aims:
1) to find the molecular pathways/mechanisms of complex diseases/traits2) to find genetic markers that these phenotypes are associated with
• Common-disease-common-variant hypothesis- phenotypes are results of cumulative effects of a number of common variants, with at best modest effect sizes
McCarthy et. al. 2008, Nature Reviews
![Page 23: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/23.jpg)
GWAS• Usually SNP-based• Conduct association tests for each SNP
between case VS control to see if there is a significant difference between 2 cohorts.
Allele # Cases # ControlsA nA,case nA,ctrl
B nB,case nB,ctrl
where and n is the minor allele frequency.
McCarthy et. al. 2008, Nature Reviews
![Page 24: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/24.jpg)
GWASLimitations• Note that even though termed “whole genome”, GWAS
till now work mostly with microarray tech use ‘tag SNPs’ which are in LD with many other SNPs GWAS may not (and typically do not) find the causative variant.
• High number of false positives with array-based GWAS currently, the GWAS variants explained only a small genetic fraction of common disease risk
• Heading towards sequencing-based GWAS, especially in looking at uncommon or rare variants
![Page 25: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/25.jpg)
GWASLimitations (cont’d)• Results can be population-specific, e.g. Type 2 diabetes risk allele frequencies decrease from Sub-
Saharan Africa through Europe to East Asia
However, they did provide new insights into novel disease-associated pathways and mechanisms – for instance in AMD.
Catalog of GWAShttp://www.genome.gov/26525384
Chen R et. al. (2012), PLoS Genetics
![Page 26: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/26.jpg)
Pharmacogenetics/genomics
• Pharmacogenetics- refers to the study of genetic variations of individual patient responses to drugs, conventionally in single or a small set of genes
• Pharmacogenomics- refers to large-scale/genome-wide study of genetic variations of individual patient responses to drugs
![Page 27: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/27.jpg)
Interethnic variations in drug responses• Warfarin is a classic example.
a very widely-used anti-coagulant and one of the most well-studied drug extremely difficult to dose because of a narrow therapeutic window genes with haplotypes that affect dosage: VKORC1 and CYP2C9 Warfarin sensitivity (on average): Asians>Caucasians>African Americans
Rettie A & Tai G (2006), Molecular Interventions Review
![Page 28: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/28.jpg)
Quantifying interethnic variation in the genome: an application
• A popular measure in population genetics is the fixation index, FST, which essentially measures population differentiation.
Chen J et. al. (2010), Pharmacogenomics
![Page 29: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/29.jpg)
A peek into a potential future1. Charcot-Marie-Tooth neuropathy (Lupski et. al., 2010, NEJM)
Whole genome sequencing of the lead author himself, who has the disease, and his family found 2 causative mutations associated with the disease, on a region on chromosome 5 affecting SH3TC2 (SH3 and tetratricopeptide repeats 2 gene)
2. The Snyder Experiment (Chen R et. al., 2012, Cell) integration of genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles of a single healthy individual over a 14-month period revealed a predisposition to Type 2 diabetes despite having no family history
![Page 30: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/30.jpg)
EMPOWERING THE MASSES“knowledge is mightier, IF you wield it right”
![Page 31: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/31.jpg)
Personal genomics for the MassesWhat can you mine from your own genome? How can you mine your own genome?What can you tell from your own genome?
• Disease susceptibility• Ancestry• Pharmocogenetics• Traits
ETC. ETC.
![Page 32: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/32.jpg)
The Bottom-up Pyramid Information Flow
Public
Clinical research
Basic Research
![Page 33: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/33.jpg)
Personal genomics for the Masses
• Unprecedented accessibility to the public• Brought about by direct-to-consumer genomic
companies Big 3: DeCode, Navigenics, 23andMe
![Page 34: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/34.jpg)
34
23andMe
• Genotype ~ 1million SNPs per genome• Illumina OmniExpress customized microarray• Ancestry
TraitsDrug responseDisease risks
• Provides your raw data which you can download
![Page 35: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/35.jpg)
35
Beyond 23andMe – Ancestry
Middle East
Inset: http://www.clker.com/clipart-9213.html
Population panels:HAPMAP (Intl HapMap Consortium 2003, Nature)HGDP (Li JZ et. al. 2008, Science)PASNP (HUGO PASNP Consortium 2009, Science)SGVP (Teo YY et. al. 2009, Gen. Res.)
![Page 36: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/36.jpg)
36
Beyond 23andMe – Ancestry
Chen J et. al. (2009), AJHG
![Page 37: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/37.jpg)
PCA in genomic data SNPs in LD can skew PCA
Modified PCA (Price et. al. (2006), Nat Genet)• 0,1,2 represent the genotypes of SNPs (0=AA, 1=AB, 2=BB,
assuming biallelic SNPs)• then instead of
normalizing by column, normalize by row
• variables = individuals• observations = SNPs• correlation matrix of individuals • plot PC1 vs PC2 by loadings (variables) instead of by PC scores
(observations)
sample YOU CEU1 CEU2 CEU3
SNP1 1 1 1 0
SNP2 2 1 2 0
SNP3 1 2 2 0
Samples SNPs
![Page 38: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/38.jpg)
PCA interpretation• Genetic differentiation by geography• Studies that showed cultural, linguistic and
historical association with such pattern
Novembre J et. al. 2008, Nature
![Page 39: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/39.jpg)
International Stem Cell Consortium (2011), Nat Biotech
![Page 40: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/40.jpg)
Disease status
• Considerations: populationpanel in which your results are based on how well-studied is the disease
![Page 41: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/41.jpg)
Mendelian diseases• High penetrance• Highly likely to be detected,
hence the results are more likely to be true• Some populations might have a higher rate
2009 Rosner et. al. Annu. Rev. Genomics. Hum. Genet.
![Page 42: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/42.jpg)
Drug Response
![Page 43: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/43.jpg)
43
Ancestry Neanderthal
![Page 44: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/44.jpg)
Tools to mine your own genomeProjects/software from the public• Dienekes Pontikos - EURO-DNA-CALC
Dienekes Anthropology Bloghttp://dienekes.blogspot.com/2008/06/euro-dna-calc-11-released.html
• Dodecad Projecthttp://dodecad.blogspot.com
• Eurogeneshttp://eurogenes.blogspot.com/
Other resources• Galaxy (http://galaxy.psu.edu/)• Interpretome (http://esquilax.stanford.edu/)• SNPTips Firefox browser extension (http://snptips.5amsolutions.com/)• SNPedia/Promethease (http://www.snpedia.com/index.php/Promethease)• A comprehensive list of tools to probe 23andMe data.
http://www.23andyou.com/3rdparty
![Page 45: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/45.jpg)
GALAXY
• http://galaxy.psu.edu/• Web-based platform• Designed for anybody to use• Workflow concept• GALAXY demo
![Page 46: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/46.jpg)
Everybody else
Industry
Academi
a + Clinic
Genomic elements discovery and annotation
Academia:• Human Genome Project• Hapmap• 1000 Genomes ProjectClinic:• Disease association• Pharmacogenetics• BiomarkersIndustry Expedite the democratization
process• Navigenics• 23andMe• deCodeMe• Illumina• Affymetrix
![Page 47: Mining your Personal Genome](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815f92550346895dce933a/html5/thumbnails/47.jpg)
Some Privacy and ethical issues• Privacy
can your identity really be kept anonymous in a research project? Li et. al. 2004, Science“Our calculations show that measuring as few as 75 statistically independent SNPs would define a small group that contained the real owner of the DNA.”
• Ethics how much, if at all, of your genomic information do you own? where do biological relatives stand in all these? genetic discrimination especially with insurance companies