RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1....

Post on 25-Mar-2020

0 views 0 download

Transcript of RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1....

DTU Aqua14. juni 2019 1

Francesca BertoliniDTU Aqua

franb@aqua.dtu.dk

RNA-seq

DTU Aqua14. juni 2019 2

SUMMARY

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Reads mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

DTU Aqua14. juni 2019 3

TRANSCRIPTOME

Complete set of transcripts in a cell and their

quantity, for a specific developmental stage or a

physiological condition.

DTU Aqua14. juni 2019 4

RNA classification

•Ribosomal RNA (rRNA): catalytic

component of ribosomes (about 80-85%)

•Transfer RNA (tRNA): transfers amino acids

to polypeptide chain at the ribosomal site of

protein synthesis (about 15%)

•Coding RNA(mRNA): carries information

about a protein sequence to the ribosomes

(about 5%)

•Other Non coding regulatory RNAs

DTU Aqua14. juni 2019 5

Delpu et al. 2016. Drug Discovery in Cancer Epigenetics

Other non coding regulatory RNAs

3. RNA classification

DTU Aqua14. juni 2019 6

Long RNAs: splicing

DNA

RNA

mRNA

lncRNA

1. Definitions

DTU Aqua14. juni 2019 7

RNA-seq

• Abundance estimation/differential expression

• Alternative splicing

• RNA editing

• Novel transcripts

• Allele specific expression

• Fusion transcripts

High-throughput sequencing technology used

for probing the transcriptome of a sample

DTU Aqua14. juni 2019 8

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Read mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

DTU Aqua14. juni 2019 9

Before RNA extraction

RNA is more unstable than DNA, therefore higher

precautions are needed to avoid degradation

TISSUE COLLECTION:

• Liquid nitrogen

• RNA later (for solid tissues)

• Tempus/Pax tubes (for blood)

DTU Aqua14. juni 2019 10

After RNA extraction

RIN (RNA integrity number): algorithm for

assigning integrity values to RNA measurements.

10: maximum

0: minimum

Integrity RIN>7 ok

DTU Aqua14. juni 2019 11

RNA quality (RIN)

and quantification:

Bioanalyzer

DTU Aqua14. juni 2019 12

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Reads mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

DTU Aqua14. juni 2019 13

Different steps for different

RNAs

Total RNA seq (DNase treatment, Ribosomal

depletion, fragmentation, library preparation)

mRNA+lnc (polyA+) RNA seq (DNase

treatment, polyA enrichment, fragmentation,

library preparation)

shortRNA seq (DNase treatment, Size selection,

library preparation)

DTU Aqua14. juni 2019 14

LIBRARY PREPARATION

DTU Aqua14. juni 2019 15

LIBRARY PREPARATION…

with 3rd gen. sequencing

DTU Aqua14. juni 2019 16

SUMMARY

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Read mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

DTU Aqua14. juni 2019 17

Transcriptome assembly strategies

Reference-based

De novo

Pseudoalignment

DTU Aqua14. juni 2019 18

Martin et al. 2011, Nature Review Genetics

DTU Aqua14. juni 2019 1919

Reference-based: Most common tools

• Unspliced read aligner

BWA

Bowtie2

Novoalign

• Spliced read aligner

Tophat2/Hisat2

STAR

• Splice-junction not

considered

• Ideal for mapping against

cDNA databases

• Novel splice-junction

detected

• Better performance for

polymorphic regions and

pseudogenes

DTU Aqua14. juni 2019 20

• (Cufflinks/StringTie)

1) First you map all the reads from your experiment

to the reference sequence.

2) Then you run another step where you use the

mapped reads to assemble potential transcripts

and identify the genomic locations of introns and

exons.

REFERENCE-GUIDED

ASSEMBLY

DTU Aqua14. juni 2019 21

Splice junctions view through IGV (Integrative Genomics Viewer)

DTU Aqua14. juni 2019 22

• Velvet

Genomics and transcriptomics

• Trinity

Transcriptomics

De novo assembly: Most common tools

DTU Aqua14. juni 2019

KALLISTO-PSEUDOALIGMENT

• Most RNA seq tools do RNA seq analysis in two

parts-

• Alignment

• Quantification

• Kallisto fuses the two steps

N. Bray et al., Nature Biotechnology (2016)

DTU Aqua14. juni 2019

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

• Create every k-mer in the transcriptome, build de Bruin

Graph and mark each k-mer

• Preprocess the transcriptome to create the T-DBG

• Indexing is faster

Target de Bruijn Graph (T-DBG)

DTU Aqua14. juni 2019

Target de Bruijn Graph (T-DBG)

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

• Use k-mers in read to find which transcript it came

from

• pseudoalignment : which transcripts the read (pair) is

compatible

DTU Aqua14. juni 2019

Target de Bruijn Graph (T-DBG)

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

• Each k-mer appears in a set of transcripts

• The intersection of all sets is our pseudoalignment

http://arxiv.org/pdf/1505.02710v2.pdf

DTU Aqua14. juni 2019 27

NORMALIZATION

• Longer genes will have more reads mapping to

them (within samples)

• Sequencing run with more depth will have more

reads mapping on each gene (between

samples)

DTU Aqua14. juni 2019 28

MAIN FACTORS DURING

NORMALIZATION

Sequencing depth

DTU Aqua14. juni 2019 29

MAIN FACTORS DURING

NORMALIZATION

Gene length

DTU Aqua14. juni 2019 30

MAIN FACTORS DURING

NORMALIZATION

RNA composition Anders and Huber , 2010 Genome Biol.

DTU Aqua14. juni 2019 31

NORMALIZATION

Normalization method Description Accounted factorsRecommendations for

use

TPM (transcripts per

kilobase million)

counts per length of

transcript (kb) per million

reads mapped

sequencing depth and

gene length

gene count comparisons

within a sample or

between samples of the

same sample group; NOT

for DE analysis

RPKM/FPKM(reads/frag

ments per kilobase of

exon per million

reads/fragments

mapped)

similar to TPMsequencing depth and

gene length

gene count comparisons

between genes within a

sample; NOT for

between sample

comparisons or DE

analysis

DESeq2’s median of

ratios

counts divided by

sample-specific size

factors determined by

median ratio of gene

counts relative to

geometric mean per gene

sequencing depth and

RNA composition

gene count comparisons

between samples and

for DE analysis; NOT for

within sample

comparisons

Common normalization methods

DTU Aqua14. juni 2019 32

READ COUNT

Count the

number of reads

aligned to each

known

transcripts/isofor

m

E.g HTSeq-count

DTU Aqua14. juni 2019 33

DIFFERENTIAL EXPRESSION

DTU Aqua14. juni 2019 34

FUNCTIONAL ENRICHMENT

ANALYSIS

Identification of classes of genes that are over-

represented among the differentially expressed genes,

and may have an association with the

disease/phenotype investigated

Gene Ontology project provides an ontology of defined terms representing

gene product properties. The ontology covers three domains:

•Molecular function: molecular activities of gene products

•Cellular component: where gene products are active

•Biological process: pathways and larger processes made up of the

activities of multiple gene products.

DTU Aqua14. juni 2019 35

Some GO and pathway analyses

websites

http://amp.pharm.mssm.edu/Enrichr/

http://cbl-gorilla.cs.technion.ac.il/

https://david.ncifcrf.gov/

https://cytoscape.org/