Lecture II: Genomic Methods Dennis P. Wall, PhD Frederick G. Barr, MD, PhD Deborah G.B. Leonard, MD,...

download Lecture II: Genomic Methods Dennis P. Wall, PhD Frederick G. Barr, MD, PhD Deborah G.B. Leonard, MD, PhD 1TRiG Curriculum: Lecture 2March 2012.

If you can't read please download the document

Transcript of Lecture II: Genomic Methods Dennis P. Wall, PhD Frederick G. Barr, MD, PhD Deborah G.B. Leonard, MD,...

  • Slide 1
  • Lecture II: Genomic Methods Dennis P. Wall, PhD Frederick G. Barr, MD, PhD Deborah G.B. Leonard, MD, PhD 1TRiG Curriculum: Lecture 2March 2012
  • Slide 2
  • Why Pathologists? We have access, we know testing Personalized Risk Prediction, Medication Dosing, Diagnosis/ Prognosis Physician sends sample to Pathology (blood/tissue) Pathologists Access to patients genome Just another laboratory test 2TRiG Curriculum: Lecture 2March 2012
  • Slide 3
  • The path to genomic medicine Sample Collection Testing: Sequencing, Gene chips Analysis Pathologists Access to patients genome Sample Collection 3TRiG Curriculum: Lecture 2March 2012
  • Slide 4
  • What we will cover today: Types of genetic alterations Current and future molecular testing methods Cytogenetics, in situ hybridization, PCR Gene chips Genotyping Expression profiling Copy number variation Next generation sequencing (NGS) Whole genome Transcriptome 4TRiG Curriculum: Lecture 2March 2012
  • Slide 5
  • DNA alterations the small stuff Point mutation Repeat alteration Deletion/Insertion CCTGAGGAG CCTGTGGAG TTCCAG(CAG) 5 CAGCAA GAATTAAGAGAAGCAGAAGCA Example: hemoglobin, beta sickle cell disease Example: epidermal growth factor receptor lung cancer TTCCAG(CAG) 60 CAGCAA Example: huntingtin Huntington disease 5TRiG Curriculum: Lecture 2March 2012
  • Slide 6
  • DNA alterations the bigger stuff Translocation Amplification Deletion/ Insertion Example: 22q11.2 region DiGeorge syndrome Example: 17q21.1 (ERBB2) Breast cancer Example: t(11;22)(q24;q12) Ewings sarcoma 11 22 Der 11 Der 22 6TRiG Curriculum: Lecture 2March 2012
  • Slide 7
  • Previous strategies to detect DNA alterations Cytogenetics: Large indels, amplification, translocations In situ hybridization: large indels, amplification, translocations http://moon.ouhsc.edu EGFR amplification in glioblastoma t(6;15) in woman with repeated abortions http://www.indianmedguru.com 7TRiG Curriculum: Lecture 2March 2012
  • Slide 8
  • Previous strategies to detect DNA alterations PCR-based approaches: Mutations, small indels, repeat alterations, large indels, amplification, translocations Alsmadi OA, et al. BMC Genomics 2003 4:21 Factor V Leiden mutation 8TRiG Curriculum: Lecture 2March 2012
  • Slide 9
  • What we will cover today: Types of genetic alterations Current and future molecular testing methods Cytogenetics, in situ hybridization, PCR Gene chips Genotyping Expression profiling Copy number variation Next generation sequencing (NGS) Whole genome Transcriptome 9TRiG Curriculum: Lecture 2March 2012
  • Slide 10
  • DNA microarray - the basics Purpose: multiple simultaneous measurements by hybridization of labeled probe DNA elements may be: Oligonucleotides cDNAs Large insert genomic clones Microarray is generated by: Printing Synthesis 10TRiG Curriculum: Lecture 2March 2012
  • Slide 11
  • Microarray technologies 11TRiG Curriculum: Lecture 2March 2012
  • Slide 12
  • Organization of a DNA microarray (adapted from Affymetrix) 1.28 cm 12TRiG Curriculum: Lecture 2March 2012
  • Slide 13
  • Hybridization of a labeled probe to the microarray 13 (adapted from Affymetrix) TRiG Curriculum: Lecture 2March 2012
  • Slide 14
  • Detection of hybridization on microarray Light from laser 14 (adapted from Affymetrix) TRiG Curriculum: Lecture 2March 2012
  • Slide 15
  • Hybridization intensities on DNA microarray following laser scanning 15TRiG Curriculum: Lecture 2March 2012
  • Slide 16
  • Overview of SNP array technology LaFramboise T. Nucleic Acids Res. 2009; 37:4181 16TRiG Curriculum: Lecture 2March 2012
  • Slide 17
  • Microarray Applications DNA analysis Polymorphism/mutation detection e.g. Disease susceptibility testing Drug efficacy/sensitivity testing Copy number detection (comparative genomic hybridization) e.g. Constitutional or cancer karyotyping Bacterial DNA e.g. Identification and speciation RNA analysis Expression profiling e.g. Breast cancer prognosis Cancer of unknown primary origin cv 17TRiG Curriculum: Lecture 2March 2012
  • Slide 18
  • Genome-wide association studies of breast cancer microarray with 317,139 SNPs Hung RJ, et al. Nature Genetics. 2008; 452:633 Cases/controls From different populations 18TRiG Curriculum: Lecture 2March 2012
  • Slide 19
  • Genotype calling Hybridization intensities translated into genotypes Large SNP numbers requires automated procedure Recent algorithms clustering/pooling strategies Raw hybridization intensities normalized Information combined across different samples at each SNP Assign genotypes to entire clusters For each sample, estimate probability of each of three genotype calls at each SNP Genotype assigned based on defined threshold of probability Missing genotypes dependent on algorithm & threshold used Teo YY, Curr Op in Lipidology. 2008; 19:133 19TRiG Curriculum: Lecture 2March 2012
  • Slide 20
  • Genotyping - Limitations & quality control Accuracy of algorithm Depends on number of samples in each cluster Prone to errors for small number of samples or SNPs with rare alleles High rates of missing genotypes: Array problems plating/synthesis issue Poor quality DNA degradation Hybridization failure Differential performance between SNPs Excess heterozygosity - sample contamination? Just another laboratory test 20TRiG Curriculum: Lecture 2March 2012
  • Slide 21
  • Analyzed 8,101 genes on chip microarrays Reference= pooled cell lines Breast cancer subgroups Perou CM, et al. Nature. 2000; 406, 747 21TRiG Curriculum: Lecture 2March 2012
  • Slide 22
  • Original two probe strategy for expression profiling on cDNA arrays Duggan DJ, et al., Nature Genetics. 1999; 21:10 22TRiG Curriculum: Lecture 2March 2012
  • Slide 23
  • Expression profiling: challenges and limitations Biological Dynamic & complex nature of gene expression Heterogeneous nature of tissue samples Variation in RNA quality Technological Reproducibility across microarray platforms Selection of probes dependence on binding efficiency Controlling for technical variability Statistical/bioinformatic Adequate experimental design Normalization to remove variability among chips Multiple testing correction Validation of results Just another laboratory test 23TRiG Curriculum: Lecture 2March 2012
  • Slide 24
  • Copy number variation: Comparative genomic hybridization CG H Array-CGH Metaphase Chromosomes Arrayed DNAs Tumor DNAReference DNA Hybridization Deletion Gain Deletion Gain 24 http://www.advalytix.com/advalytix/hybridization_330.htm TRiG Curriculum: Lecture 2March 2012
  • Slide 25
  • Constitutional genomic imbalances detected by copy number arrays 10.9 Mb deletion at 7q11 7.2 Mb duplication on 11q Miller DT, et al, Amer J Hum Genet. 2010; 86:749 25TRiG Curriculum: Lecture 2March 2012
  • Slide 26
  • Copy number - Limitations & quality control Artifacts may be caused by: GC content Wavy patterns correlate with GC content Algorithms developed to remove waviness DNA sample quantity and quality Can impact on level of signal noise and false positive rate Whole genome amplification associated with signal noise Sample composition In cancer studies, normal cells dilute cancer aberrations Tumor heterogeneity will also affect copy number 26 Just another laboratory test TRiG Curriculum: Lecture 2March 2012
  • Slide 27
  • What we will cover today: Types of genetic alterations Current and future genetic test methods Cytogenetics, in situ hybridization, PCR Gene chips Genotyping Expression profiling Copy number variation Next generation sequencing (NGS) Whole genome Transcriptome 27TRiG Curriculum: Lecture 2March 2012
  • Slide 28
  • Cancer Treatment : NGS in AML Welch JS, et al. JAMA, 2011;305, 1577 28TRiG Curriculum: Lecture 2March 2012
  • Slide 29
  • Case History 39 year old female with APML by morphology Cytogenetics and RT-PCR unable to detect PML-RAR fusion Clinical question: Treat with ATRA versus allogeneic stem cell transplant 29TRiG Curriculum: Lecture 2March 2012
  • Slide 30
  • Methods/Results Paired-end NGS sequencing Result: Cytogenetically cryptic event: novel fusion protein Took 7 weeks 30TRiG Curriculum: Lecture 2March 2012
  • Slide 31
  • 77-kilobase segment from Chr. 15 was inserted en bloc into the second intron of the gene RARA on Chr. 17. 31TRiG Curriculum: Lecture 2March 2012
  • Slide 32
  • Workflow Image processing and base calling Raw Data Analysis Alignment to reference genome Whole Genome Mapping Detection of genetic variation (SNPs, Indels, SV) Variant Calling Linking variants to biological information Annotation 32TRiG Curriculum: Lecture 2March 2012
  • Slide 33
  • Overview of Paired End Sequencing Short Insert DNA Random Shearing Adapter s Ligated Annealed to Surface Sequenced Synthesized Sequencing done with labeled NTPs and massively parallel 33TRiG Curriculum: Lecture 2March 2012
  • Slide 34
  • Short read output format Read ID Sequence Quality line 34TRiG Curriculum: Lecture 2March 2012
  • Slide 35
  • Quality control is critical Just another laboratory test 35TRiG Curriculum: Lecture 2March 2012
  • Slide 36
  • Measuring Accuracy Phred is a program that assigns a quality score to each base in a sequence. These scores can then be used to trim bad data from the ends, and to determine how good an overlap actually is. Phred scores are logarithmically related to the probability of an error: a score of 10 means a 10% error probability; 20 means a 1% chance, 30 means a 0.1% chance, etc. A score of 20 is generally considered the minimum acceptable score. 36TRiG Curriculum: Lecture 2March 2012
  • Slide 37
  • Workflow Image processing and base calling Raw Data Analysis Alignment to reference genome Whole Genome Mapping Detection of genetic variation (SNPs, Indels, SV) Variant Calling Linking variants to biological information Annotation 37TRiG Curriculum: Lecture 2March 2012
  • Slide 38
  • Alignment/Mapping CCATAGGCTATATGCGCCCTATCGGCAATTTGCGGTATAC GCGCCCTA GCCCTATCG CCTATCGGA CTATCGGAAA AAATTTGC TTTGCGGT TTGCGGTA GCGGTATA GTATAC TCGGAAATT CGGAAATTT CGGTATAC TAGGCTATA GCCCTATCG CCTATCGGA CTATCGGAAA AAATTTGC TTTGCGGT TCGGAAATT CGGAAATTT AGGCTATAT GGCTATATG CTATATGCG CC CCA CCAT ATAC C CCAT CCATAGTATGCGCCC GGTATAC CGGTATAC GGAAATTTG CCATAGGCTATATGCGCCCTATCGGCAATTTGCGGTATAC ATAC CC GAAATTTGC Read depth is critical for accurate reconstruction 38TRiG Curriculum: Lecture 2March 2012
  • Slide 39
  • Alignment approaches AlignerDescription Illumina platform ELAND Vendor-provided aligner for Illumina data Bowtie Ultrafast, memory-efficient short-read aligner for Illumina data Novoalign A sensitive aligner for Illumina data that uses the NeedlemanWunsch algorithm SOAP Short oligo analysis package for alignment of Illumina data MrFAST A mapper that allows alignments to multiple locations for CNV detection SOLiD platform Corona-lite Vendor-provided aligner for SOLiD data SHRiMP Efficient SmithWaterman mapper with colorspace correction 454 Platform Newbler Vendor-provided aligner and assembler for 454 data SSAHA2 SAM-friendly sequence search and alignment by hashing program BWA-SW SAM-friendly SmithWaterman implementation of BWA for long reads Multi-platform BFAST BLAT-like fast aligner for Illumina and SOLiD data BWA Burrows-Wheeler aligner for Illumina, SOLiD, and 454 data Maq A widely used mapping tool for Illumina and SOLiD; now deprecated by BWA Koboldt DC, et al. Brief Bioinform 2010 Sep;11(5):484-98 39TRiG Curriculum: Lecture 2March 2012
  • Slide 40
  • Short read alignment Given a reference and a set of reads, report at least one good local alignment for each read if one exists Approximate answer to question: where in genome did read originate? TGATCATA GATCAA TGATCATA GAGAAT better than What is good? For now, we concentrate on: TGATATTA GATcaT TGATCATA GTACAT better than Fewer mismatches = better Failing to align a low-quality base is better than failing to align a high-quality base 40TRiG Curriculum: Lecture 2March 2012
  • Slide 41
  • Post alignment: what do you get? Alignment of reads including read pairs SAM file Read Pair CIGAR field Simplified pileup output Li H, et al. Bioinformatics. 2009;25:2078 41TRiG Curriculum: Lecture 2March 2012
  • Slide 42
  • Workflow Image processing and base calling Raw Data Analysis Alignment to reference genome Whole Genome Mapping Detection of genetic variation (SNPs, Indels, insertions) Variant Calling Linking variants to biological information Annotation 42TRiG Curriculum: Lecture 2March 2012
  • Slide 43
  • Discovering Genetic Variation SNPs ATCCTGATTCGGTGAACGTTATCGACGATCCGATCGA CGGTGAACGTTATCGACGATCCGATCGAACTGTCAGC GGTGAACGTTATCGACGTTCCGATCGAACTGTCAGCG TGAACGTTATCGACGTTCCGATCGAACTGTCAGCGGC GTTATCGACGATCCGATCGAACTGTCAGCGGCAAGCT TTATCGACGATCCGATCGAACTGTCAGCGGCAAGCT ATCCTGATTCGGTGAACGTTATCGACGATCCGATCGAACTGTCAGCGGCAAGCTGATCGATCGATCGATGCTAGTG TTATCGACGATCCGATCGAACTGTCAGCGGCAAGCT TCGACGATCCGATCGAACTGTCAGCGGCAAGCTGAT ATCCGATCGAACTGTCAGCGGCAAGCTGATCG CGAT TCCGATCGAACTGTCAGCGGCAAGCTGATCG CGATC TCCGATCGAACTGTCAGCGGCAAGCTGATCGATCGA GATCGAACTGTCAGCGGCAAGCTGATCG CGATCGA AACTGTCAGCGGCAAGCTGATCG CGATCGATGCTA TGTCAGCGGCAAGCTGATCGATCGATCGATGCTAG INDELs ATCCTGATTCGGTGAACGTTATCGACGATCCGATCGA TCAGCGGCAAGCTGATCGATCGATCGATGCTAGTG reference genome 43TRiG Curriculum: Lecture 2March 2012
  • Slide 44
  • 44TRiG Curriculum: Lecture 2March 2012
  • Slide 45
  • 45TRiG Curriculum: Lecture 2March 2012
  • Slide 46
  • Workflow Image processing and base calling Raw Data Analysis Alignment to reference genome Whole Genome Mapping Detection of genetic variation (SNPs, Indels, insertions) Variant Calling Linking variants to biological information Annotation 46TRiG Curriculum: Lecture 2March 2012
  • Slide 47
  • Where to go to annotate genomic data, determine clinical relevance? Online Mendelian Inheritance in Man (http://www.ncbi.nlm.nih.gov/omim)http://www.ncbi.nlm.nih.gov/omim International HapMap project (http://hapmap.ncbi.nlm.nih.gov)http://hapmap.ncbi.nlm.nih.gov Human genome mutation database (http://www.hgvs.org/dblist/glsdb.html) PharmGKB (http://www.pharmgkb.org)http://www.pharmgkb.org Scientific literature 47TRiG Curriculum: Lecture 2March 2012
  • Slide 48
  • Case-control study design = variable results Need for Clinical Grade Database Ease of use Continually updated Clinically relevant SNPs/variations 48 Ng PC, et al. Nature. 2009; 461: 724 TRiG Curriculum: Lecture 2March 2012
  • Slide 49
  • Cancer Treatment: NGS of Tumor Jones SJM, et al. Genome Biol. 2010;11:R82. 49TRiG Curriculum: Lecture 2March 2012
  • Slide 50
  • Case History 78 year old male Poorly differentiated papillary adenocarcinoma of tongue Metastatic to lymph nodes Failed chemotherapy Decision to use next- generation sequencing methods 50TRiG Curriculum: Lecture 2March 2012
  • Slide 51
  • Workflow Image processing and base calling Raw Data Analysis Alignment to reference genome Whole Genome Mapping Detection of genetic variation (SNPs, Indels, SV) Variant Calling Linking variants to biological information Annotation 51TRiG Curriculum: Lecture 2March 2012
  • Slide 52
  • Methods and Results Analysis Whole genome Transcriptome Findings Upregulation of RET oncogene Downregulation of PTEN 52TRiG Curriculum: Lecture 2March 2012
  • Slide 53
  • Transcriptome and Whole-exome Transcriptome Convert RNA to cDNA Perform sequencing Only expressed genes Can get expression levels Whole-exome Use selection procedure to enrich exons No intron data Results depends on selection procedure 53 Martin JA, Wang Z. Nat Rev Genet. 2011; 12:671. TRiG Curriculum: Lecture 2March 2012
  • Slide 54
  • A few words about samples Can use formalin-fixed paraffin-embedded tissue for whole-exome or transcriptome sequencing Need frozen tissue for whole-genome sequencing Better quality DNA Small quantity of DNA needed For whole-exome sequencing, amount off a few slides 54TRiG Curriculum: Lecture 2March 2012
  • Slide 55
  • Summary Gene chips SNPs Expression profiling Copy number variation Major steps in NGS Base calling Alignment Variant calling Annotation Technology will change but just another test Accuracy Precision Need to validate findings with traditional methods Roychowdhury S, et al. Sci Transl Med. 2011; 3: 111ra121 55TRiG Curriculum: Lecture 2March 2012