Forsharing cshl2011 sequencing

High-‐Resolu,on Views of Cancer Genomes

The Central Dogma

Your Nature Paper

Our First Experiment

Overview of BAC in the Genome

Sequencing a BAC

Sequence Coverage

Repeats

Repeats are not created equal

Genomic Sequencing

TargeFng the Exome

  Long oligos synthesized on arrays (DNA)

  RNA baits synthesized from DNA oligo template

  RNA baits hybridized to DNA sequencing library

  Targets captured using beads and bioFn-‐labeled baits

  RNA bait degraded, leaving sequencing library enriched for target regions

Data Flow

  FASTQ files generated by Illumina pipeline   Aligned to reference genome (hg18, excluding _random, unmapped, and hap) using Novoalign   SAM/BAM used extensively

  Follow Broad InsFtute GATK pipeline for exome capture

  Use picard java library for quality assessment   Processed BAM files available via local hZp for browsing

Data Pipeline....

  Samtools import   Samtools sort

  Picard MarkDuplicates

  GATK Indel Realignment

  GATK Quality RecalibraFon

  Picard QC metrics

Realignment around Indels

  The problem -  Aligners align each read independently -  PotenFally leads to increased error rates around

indels

  A potenFal soluFon -  Locally realign reads in regions that might

harbor an indel -  Goal is to align reads overlying indels more

accurately, reducing errors in each read and, in turn, reducing SNV call error rates

Quality Recalibration

  Since most SNV callers will rely on quality scores to estimate error probabilities, having the best possible estimates for error rates is important

  Reported error rates from the Illumina sequencer generally reflect technical parameters of the base call process, but not other systematic biases

  Quality recalibration can include covariates to account for systematic biases

-  Cycle count, dinucleotide context, original quality, and sample/library variables

Variant Calling and EvaluaFon

A developing art

Sequencing Tumor/Normal Pairs

Good SNP

Suspect Variant

SomaFc (tumor only) Variant

Likely False PosiFve (normal only)

NCI60 Exome Sequencing

No Normals Available!

Variants by Genomic LocaFon

All Coding Variants

Type 1: in dbSNP, Type 2: not in dbSNP

Coding, novel (no dbSNP)

Copy Number from Exomes

Complete Genome Sequencing

Complete Genomics Data

  Delivery   Via USB results

  Storage   Sizes are LARGE -  400GB per sample as delivered with raw reads included

  Should use 2-‐locaFon backed-‐up storage -  Not trivial to find such storage, so might resort to mulFple USB drives

  Minimize: -  Data movement -  Keeping mulFple copies indefinitely

Breakdown of Data Sizes

  Delivery   Storage   Processing

  Data are typically tab-‐delimited text files, so Excel can be useful for examining individual small files

  Generally, command-‐line tools needed   MacOS and linux only supported operaFng systems, but Windows might work....

  Some analyses (snpdiff) require large memory

Directory Structure

Workflows

  Tumor/Normal   Copy Number

  Structural Varia,on   Annotated SomaFc Variants

  Germline   List of annotated genotypes per individual, summarized into a single file that can be used for filtering

Germline Workflow

  Output   Future direcFons

  Be “smarter” about inheritance framework

  Further refinements of comparison to other data types (exomes, snp arrays, RNA-‐seq)

Tumor/Normal Workflow

Medvedev et al., Nature 2009

The Cancer Genome Atlas Research Network Nature 000, 1-‐8 (2008) doi:10.1038/nature07385

Frequent geneFc alteraFons in three criFcal signalling pathways.

ChromaFn

  ChromaFn is the complex of protein and DNA that make up the chromosomes. It is not a staFc structure.

  DNAse is an enzyme that cuts DNA at locaFons where DNA is accessible

  These “accessible” regions have been associated with open chromaFn

  Regions of open chromaFn are necessary for transcripFonal and regulatory machinery to have access to gene neighborhoods and facilitate transcripFon

DNAse HypersensiFvity

  Method for finding regions of “open” chromaFn

  In data published with the ENCODE consorFum, DNAse hypersensiFve (HS) were shown to be correlated with:   Histone modificaFon   TranscripFon start sites   Early replicaFng regions   TranscripFon factor binding sites (experimentally determined by ChIP/chip, etc.)

IdenFficaFon and analysis of funcFonal elements in 1% of the human genome by the ENCODE pilot project. The ENCODE ConsorFum. Nature, 2007.

DNAse-‐chip Method

Crawford, G.E., Davis, S., Scacheri, P.C., Renaud, G., Halawi, M.J., Erdos, M.R., Green, R., Meltzer, P.S., Wolfsberg, T.G., and Collins, F.S. Nat Methods, 2006

DNAse-‐Seq Method

Crawford, G.E., Davis, S., Scacheri, P.C., Renaud, G., Halawi, M.J., Erdos, M.R., Green, R., Meltzer, P.S., Wolfsberg, T.G., and Collins, F.S. Nat Methods, 2006

DNAse Sites RelaFve to Genes

DNAse HS Sites and Gene Expression

  DNAse HS sites near transcripFon start sites are associated with acFvely transcribed genes.

  Distances between sequences in non-‐DNAse HS regions have an oscillaFng paZern with frequency that corresponds to a single turn of the double-‐helix

  DNAse is known to cut preferenFally in the minor groove, which is exposed every 10.4 bases when wrapped around a nucleosome

  A nucleosome is wrapped by 147 base pairs when complexed with DNA

  ImplicaFon: Nucleosomes are posiFoned in a highly organized, precise manner

Nucleosome PosiFoning

The Last Mile

Forsharing cshl2011 sequencing

Technology

Transcript of Forsharing cshl2011 sequencing

DNA Sequencing : Maxam Gilbert and Sanger Sequencing

Review Single-molecule DNA sequencing …repository.ias.ac.in/12895/1/577.pdfSingle-pass sequencing: sequencing each individual DNA template only once, resulting in error-prone sequencing

Local to Global Final Evaluation Report ForSharing 112015 ... to Global... · ! 3! SeismicWaves& This!story!focuses!onseismic!waves!and!how!theycontribute!toour!understandingof!earthquakes!and!

Sanger Sequencing - KSU · DNA sequencing: • The term DNA sequencing refers to ….. •A sequencing can be done by different methods including: 1. Maxam –Gilbert sequencing (chemical

‘Sanger sequencing’ has been the only DNA sequencing method for 30 years but… …hunger for even greater sequencing throughput and more economical sequencing.

Sequencing Board With Sequencing and Following Directions Cards Assembly Instructions · 2019-05-20 · Sequencing Board With Sequencing and Following Directions Cards Assembly Instructions

DNA Sequencing: Current State and Prospects of Development · 2019-07-16 · DNA Sequencing Methods DNA sequencing methods are: Basic DNA sequencing • Sanger method (chain termination).

Next-Generation Sequencing Next-Generation Sequencing ... · PDF fileNext-Generation Sequencing Technologies Next-Generation Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson

John A Pack. Introduction DNA Sequencing Circos Plot IDH1 Sequencing Genomes Examples of Sequenced Cancer Genomes Sequencing Disagreements.

Sanger Sequencing - KSU · •The term DNA sequencing refers to ….. •Application? •A sequencing can be done by different methods including: 1. Maxam–Gilbert sequencing (chemical

Genome sequencing of bacteria: sequencing, de novo assembly and

Digital Brochure FA2 Pathed 17Jan2020 Forsharing 200px · 2020-04-28 · Materoan Ending Rural Thirst Best Innovation Award ASEAN Impact Challenge 2016 UN Young Leader Representative

KS3 - KS4 Sequencing skills and content Sequencing Core ...

Life Cycle - Schudio...twinkl.com twinkl.com twinkl.com Life Cycle Sequencing Cards Life Cycle Sequencing Cards Life Cycle Sequencing Cards Life Cycle Sequencing Cards twinkl.com twinkl.com

Automated DNA Sequencing - Amplicon Express€¦ · DNA sequencing kits use cycle sequencing protocols. See Chapter 3 for information on cycle sequencing protocols. Figure 1-3 Cycle

WHOLE GENOME SEQUENCING: Transforming …...WHOLE GENOME SEQUENCING: Transforming health research What is the purpose of whole genome sequencing? Whole genome sequencing turns blood

SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.

02 dna sequencing v2 - Department of Computer Science · 7/20/2011 · DNA sequencing Since ~2010 Fred Sanger 1918-2013 “Chain termination” sequencing. Sanger sequencing Sanger

Genome Sequencing and Assembly High throughput Sequencing

Tumor Sequencing and Next-Generation Sequencing