RNA-seq: A High-resolution View of the Transcriptome

88
Sean Davis, M.D., Ph.D. Genetics Branch, Center for Cancer Research National Cancer Institute National Institutes of Health RNA-seq: A high-resolution View of the Transcriptome

Transcript of RNA-seq: A High-resolution View of the Transcriptome

Page 1: RNA-seq: A High-resolution View of the Transcriptome

Sean Davis, M.D., Ph.D.Genetics Branch, Center for Cancer Research

National Cancer InstituteNational Institutes of Health

RNA-seq: A high-resolutionView of the Transcriptome

Page 2: RNA-seq: A High-resolution View of the Transcriptome

Normal Karyotype

Tumor Karyotype

Page 3: RNA-seq: A High-resolution View of the Transcriptome

The Central Dogma

Page 4: RNA-seq: A High-resolution View of the Transcriptome

phenotype

Gene Copy Number

Sequence Variation

Chromatin Structure and

Function

Gene Expression

Transcriptional Regulation

DNA Methylation

Patient and Population

Characteristics

Page 5: RNA-seq: A High-resolution View of the Transcriptome
Page 6: RNA-seq: A High-resolution View of the Transcriptome

+

Page 7: RNA-seq: A High-resolution View of the Transcriptome
Page 8: RNA-seq: A High-resolution View of the Transcriptome

=

Page 9: RNA-seq: A High-resolution View of the Transcriptome

Your Nature Paper

Page 10: RNA-seq: A High-resolution View of the Transcriptome

High Throughput SequencingAKA, NGS

Page 11: RNA-seq: A High-resolution View of the Transcriptome

DNA(0.1-1.0 ug)

Single molecule arraySample preparation

Cluster growth5’

5’3’

G

T

C

A

G

T

C

A

G

T

C

A

C

A

G

TC

A

T

C

A

C

C

TAG

CG

TA

GT

1 2 3 7 8 94 5 6

Image acquisition Base calling

T G C T A C G A T …

Sequencing

Illumina SBS TechnologyReversible Terminator Chemistry Foundation

© Illumina, Inc.http://www.illumina.com/technology/sequencing_technology.ilmnhttp://seqanswers.com/forums/showthread.php?t=21

Page 12: RNA-seq: A High-resolution View of the Transcriptome

Single end vs paired end sequencing

Illumina Paired-end sequencingPaired-end: useful for RRBS, essential for RNA-seq, not useful for ChIP-

seq

Page 13: RNA-seq: A High-resolution View of the Transcriptome

What comes out of the machine: short reads in fastq format

@D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1CTCCTGGAAAACGCTTTGGTAGATTTGGCCAGGAGCTTTCTTTTATGTAAATTG+D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1[^^cedeefee`cghhhfcRX`_gfghf^bZbecg^eeb[caef`ef^a_`eXa@D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1TCCANCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTC+D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1ab_eBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1TACAAGTGCAGCATCAAGGAGCGAATGCTCTACTCCAGCTGCAAGAGCCGCCTC+D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1_[_ceeec[^eeghdffffhh^efh_egfhfgeec_fbafhhhhd`caegfheh@D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1GAAGGAGAGAAGGGGAGGAGGGCGGGGGGCACCTACTACATCGCCCTCCACATC+D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1\^_accceg`gga`f[fgcb`Ucgfaa_LVV^[bbbbbRWW`W^Y[_[^bbbbb@D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1GTGGCCGATTCCTGAGCTGTGTTTGAGGAGAGGGCGGAGTGCCATCTGGGTAGC+D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1aa_eeeeegggggihhiiifgeghfeghbgcghifiidg^dbgggeeeee`dcd@D3B4KKQ1_0166:8:1101:2358:2174#CGATGT/1CTGACCTGGGTCCTGTGGTGCTCAGCCTTTTGAAGATGCCAGAAAAATACGTCG+D3B4KKQ1_0166:8:1101:2358:2174#CGATGT/1\^_cccccg^Y`ega`fg`ebegfhd^egghhghfffhghdhbfffhhhfgfcf

QS to int In R:as.integer(charToRaw(‘e'))-33

Page 14: RNA-seq: A High-resolution View of the Transcriptome

Pair end sequencings_8_1_sequence.txt.gz s_8_2_sequence.txt.gz

@D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1CTCCTGGAAAACGCTTTGGTAGATTTGGCCAGGAGCTTTCTTTTATGTAAATTG+D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1[^^cedeefee`cghhhfcRX`_gfghf^bZbecg^eeb[caef`ef^a_`eXa@D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1TCCANCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTC+D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1ab_eBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1TACAAGTGCAGCATCAAGGAGCGAATGCTCTACTCCAGCTGCAAGAGCCGCCTC+D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1_[_ceeec[^eeghdffffhh^efh_egfhfgeec_fbafhhhhd`caegfheh@D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1GAAGGAGAGAAGGGGAGGAGGGCGGGGGGCACCTACTACATCGCCCTCCACATC+D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1\^_accceg`gga`f[fgcb`Ucgfaa_LVV^[bbbbbRWW`W^Y[_[^bbbbb@D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1GTGGCCGATTCCTGAGCTGTGTTTGAGGAGAGGGCGGAGTGCCATCTGGGTAGC+D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1aa_eeeeegggggihhiiifgeghfeghbgcghifiidg^dbgggeeeee`dcd

@D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/2GGCATATTTAACAGCATTGAACAGAATTCTGTGTCCTGTAAAAAAATTAGCTTA+D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/2a__aaa`ce`cgcffdf_acda^ea]befffbeged`g[a`e_caaac]cb`gb@D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/2TTGAGGCTGTTGTCATACTTCTCATGGTTCACACCCATGACGAACATGGGGGCG+D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/2a__eeeeeggegefhhhiiihhhhhiieghhhghhiiffhiififhhiihegic@D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/2CGGGGTGCACCTCGTCGTAGAGGAACTCTGCCGTCAGCTCTGCCCCATCGCCAA+D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/2^__ee__cge`cghghhfgddgfgi]ehhfffff^ec[beegidffhhfhadba@D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/2CTTAGTCTCAGTTTTCCTCCAGCAGCCTGAGGAAACTCAAAGGCACAGTTCCCA+D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/2_abeaaacg^g^eghhhhgafghhdfghfedeghfiiicfbgdHYagfeecggf@D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/2TAGGCTCAAAGTCTAACGCCAATCCCGAACCTGGGCATCTGTACACACACACAC+D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/2abbeceeegggcghiihiihhhhiifhiiiiihiiiiiiihegh`eggfebfhg

… …

Page 15: RNA-seq: A High-resolution View of the Transcriptome
Page 16: RNA-seq: A High-resolution View of the Transcriptome
Page 17: RNA-seq: A High-resolution View of the Transcriptome

RNA-seq protocol schematic

Page 18: RNA-seq: A High-resolution View of the Transcriptome

Our First Experiment

Page 19: RNA-seq: A High-resolution View of the Transcriptome

Overview of BAC in the Genome

Page 20: RNA-seq: A High-resolution View of the Transcriptome

Sequencing a BAC

Page 21: RNA-seq: A High-resolution View of the Transcriptome

Sequence Coverage

Page 22: RNA-seq: A High-resolution View of the Transcriptome

Repeats

Page 23: RNA-seq: A High-resolution View of the Transcriptome

Repeats

Page 24: RNA-seq: A High-resolution View of the Transcriptome

Repeats are not created equal

Page 25: RNA-seq: A High-resolution View of the Transcriptome

Approaches to RNA-seq

Nature Biotech (2010) 28, 421-423

Page 26: RNA-seq: A High-resolution View of the Transcriptome

Alignment

Page 27: RNA-seq: A High-resolution View of the Transcriptome

RNA-seq Alignment

Page 28: RNA-seq: A High-resolution View of the Transcriptome
Page 29: RNA-seq: A High-resolution View of the Transcriptome
Page 30: RNA-seq: A High-resolution View of the Transcriptome

Run Time

Page 31: RNA-seq: A High-resolution View of the Transcriptome

Alignment Yield

Page 32: RNA-seq: A High-resolution View of the Transcriptome

Splice Read Placement Accuracy

Page 33: RNA-seq: A High-resolution View of the Transcriptome

Impact on Transcript Assembly

Page 34: RNA-seq: A High-resolution View of the Transcriptome

Transcript Quantification

Page 35: RNA-seq: A High-resolution View of the Transcriptome
Page 36: RNA-seq: A High-resolution View of the Transcriptome

Models for RNA-seq

• Count-based models• Multi-reads (isoform resolution)• Paired-end reads (include length resolution

step)• Positional bias along transcript length• Sequence bias

Page 37: RNA-seq: A High-resolution View of the Transcriptome

Read Counting

Page 38: RNA-seq: A High-resolution View of the Transcriptome

Mortazavi, 2008, NMeth

Page 39: RNA-seq: A High-resolution View of the Transcriptome

L. Pachter (2011) arXiv:1104.3889v

Page 40: RNA-seq: A High-resolution View of the Transcriptome

Sequence Bias--priming

Hansen (2010), NAR

Page 41: RNA-seq: A High-resolution View of the Transcriptome

Sample-specific Sequence Bias

Page 42: RNA-seq: A High-resolution View of the Transcriptome
Page 43: RNA-seq: A High-resolution View of the Transcriptome

Models for RNA-seq

Page 44: RNA-seq: A High-resolution View of the Transcriptome

Result of Quantification

Page 45: RNA-seq: A High-resolution View of the Transcriptome

Clustering and Visualization

Page 46: RNA-seq: A High-resolution View of the Transcriptome
Page 47: RNA-seq: A High-resolution View of the Transcriptome
Page 48: RNA-seq: A High-resolution View of the Transcriptome
Page 49: RNA-seq: A High-resolution View of the Transcriptome
Page 50: RNA-seq: A High-resolution View of the Transcriptome
Page 51: RNA-seq: A High-resolution View of the Transcriptome

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 52: RNA-seq: A High-resolution View of the Transcriptome

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 53: RNA-seq: A High-resolution View of the Transcriptome

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 54: RNA-seq: A High-resolution View of the Transcriptome

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 55: RNA-seq: A High-resolution View of the Transcriptome

Distance Metrics

Euclidean distance

Manhattan distance

Minkowski distance (generalized distance)

Page 56: RNA-seq: A High-resolution View of the Transcriptome

Distance Metrics• Correlation

– maximum value of 1 if X and Y are perfectly correlated– minimum value of -1 if X and Y are exactly opposite– d(X,Y) = 1 – rxy

• Many, many others• Choice of distance metric can be driven by

underlying data (eg., binary data, categorical data, outliers, etc.)

Page 57: RNA-seq: A High-resolution View of the Transcriptome

Example of Distance Metric Choice

Page 58: RNA-seq: A High-resolution View of the Transcriptome

Example• dat = matrix(rnorm(10000),ncol=20)• dat[1:100,1:10] = dat[1:100,1:10]+1• hclust• dist• as.dist(1-cor)

Page 59: RNA-seq: A High-resolution View of the Transcriptome
Page 60: RNA-seq: A High-resolution View of the Transcriptome
Page 61: RNA-seq: A High-resolution View of the Transcriptome

Differential Expression

Page 62: RNA-seq: A High-resolution View of the Transcriptome

MA Plot

Page 63: RNA-seq: A High-resolution View of the Transcriptome
Page 64: RNA-seq: A High-resolution View of the Transcriptome
Page 65: RNA-seq: A High-resolution View of the Transcriptome
Page 66: RNA-seq: A High-resolution View of the Transcriptome
Page 67: RNA-seq: A High-resolution View of the Transcriptome
Page 68: RNA-seq: A High-resolution View of the Transcriptome
Page 69: RNA-seq: A High-resolution View of the Transcriptome
Page 70: RNA-seq: A High-resolution View of the Transcriptome
Page 71: RNA-seq: A High-resolution View of the Transcriptome

DE False Positive Rates

Page 72: RNA-seq: A High-resolution View of the Transcriptome

DE Evaluation

Page 73: RNA-seq: A High-resolution View of the Transcriptome

DE Software Runtime

Page 74: RNA-seq: A High-resolution View of the Transcriptome
Page 75: RNA-seq: A High-resolution View of the Transcriptome

RNA-seq workflow as proposed by Anders et al. in Nature Protocols

Page 76: RNA-seq: A High-resolution View of the Transcriptome

MA Plot

Page 77: RNA-seq: A High-resolution View of the Transcriptome
Page 78: RNA-seq: A High-resolution View of the Transcriptome
Page 79: RNA-seq: A High-resolution View of the Transcriptome

Fusion Gene Detection

Page 80: RNA-seq: A High-resolution View of the Transcriptome

Fusion gene schematic

Page 81: RNA-seq: A High-resolution View of the Transcriptome
Page 82: RNA-seq: A High-resolution View of the Transcriptome

Fusion Detection

Page 83: RNA-seq: A High-resolution View of the Transcriptome

False Positive Fusion Detection

Page 84: RNA-seq: A High-resolution View of the Transcriptome

Experimental Design

• What are my goals?– Differential expression?– Transcriptome assembly?– Identify rare, novel trancripts?

• System characteristics?– Large, expanded genome?– Intron/exon structures complex?– No reference genome or transcriptome

Page 85: RNA-seq: A High-resolution View of the Transcriptome

Experimental Design

• Technical replicates– Probably not needed due to low technical variation

• Biological replicates– Not explicitly needed for transcript assembly– Essential for differential expression analysis– Number of replicates often driven by sample

availability for human studies– More is almost always better

Page 86: RNA-seq: A High-resolution View of the Transcriptome

Links of Interest

• http://bioconductor.org• http://biostars.org• http://www.rna-seqblog.com/• https://genome.ucsc.edu/ENCODE/• http://www.ncbi.nlm.nih.gov/gds/

Page 87: RNA-seq: A High-resolution View of the Transcriptome
Page 88: RNA-seq: A High-resolution View of the Transcriptome

Visualizing Splicing