Obstacles and challenges in the analysis of microRNA sequencing...

23
Obstacles and challenges in the analysis of microRNA sequencing data (miRNA-Seq) David Humphreys Genomics core Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian

Transcript of Obstacles and challenges in the analysis of microRNA sequencing...

Page 1: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Obstacles and challenges in the analysis of microRNA sequencing data

(miRNA-Seq)

David Humphreys

Genomics core

Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian

Page 2: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

The ABCs about miRNAs (Annotation, Biogenesis, Curation)

www.mirbase.org• Mature fasta file• Stem loop fasta file• Gff (genome coordinate file)

Page 3: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

miRNA-Seq applications

Read length covers entire mature transcript

Discovery

- Novel miRNAs

- Isoforms

- Biogenesisiii ) non canonical processingiv) Strand selectionv) length/ non-template additions

Quantification

- Differentially expressed miRNAs

- Differential processing

Page 4: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Experimental design

• Sample selection• Species, replicates

• RNA extraction

• Library preparation

Page 5: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Kim et al., (2011)Molecular Cell 43, 1005-1014

Low confluence = 500,000 cellsHigh confluence = 800,000 cells

Cell number(L) = 200,000(H) = 800,000

RNA extraction

ColumnLiquid Bead

Prep time ++ ++++ +++

miRNA purification +++ ++++ ++++

Recovery ++++ +++ +++

Limitations/pitfalls Low input miRNA bias

Early protocols no miRNA ???

Kim et al., (2012)Molecular Cell 46, 893-895

NO change!!

Rati

o 1

41/2

00c

Down regulated miRNAs:

141, 29b , 21, 106b, 15a, 34a

• Most susceptible:

- Low GC content,

- 2ndary structure

• Small RNA ppt with longer RNA

Page 6: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

RNA quantification and integrity

Nano drop Qubit Agilent

seqanswers.com/forums/showthread.php?t=21280

WARNING!- Accuracy poor below 50ng/ul- Careful of concentrations > 1ug/ul

WARNING!- Known biases in quantifying

ssRNA < 50ng/ul

230 260 280

WARNING!- Quantification only accurate in

the defined range (read manual)

Assays specific for DNA/RNA Quantitate sizeCan detect salt & other contaminants

Absorb

ance

Page 7: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Library prep kit comparison

Sample prep

Adaptor ligation

RT(Reverse

Transcription)

PCR

miRNAP- -OH

miRNA

i) Hybridisation

ii) Ligation

iii) DenaturationSequential Ligation

miRNA miRNA

# Hafner et al., (2011) RNA 17(9), 1-16

# Sequence# Temperature# Incubation times

# PCR cycles … OK

# Input amount

# PH, buffers/salts/ATP

Page 8: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Summary

• Sample selection• Species, replicates

• RNA extraction• Use same method for all preps

• Quantify (2 methods)

• Assess integrity

• Library preparation• Consistent input

• Consistent ligation conditions (time/temperature)

• Use same kits

Page 9: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

miRNA-Seq Bioinformatics

(Trim - ALIGN – Report)

Page 10: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Anscombe’s Quartet

• Maths is a tool for analysis.• You can blindly ignore biases and errors in data sets.

- mean, stdev, variance, correlation are the same!

Image from wikipediahttps://en.wikipedia.org/wiki/Anscombe%27s_quartet

Page 11: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Challenges

Multimappers

Mismatches

AlignersSharing data

• Length of a sequence read covers entire microRNA transcript

• Upstream bias will have impacts on analysis

Sample preparation

SequencingLibrary preparation

Clonal amplification

Bioinformatics

Normalisation

Differential expression

Feature counting

Visualisation

Page 12: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Choice of reference?

Genome miRBase stem-loop

Better discovery

Possible incorrect/loss of mappings Forced (biased) mapping

Faster, less complicated.Slower, computationally restrictive?

Limited discovery

Page 13: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

miR-486

Multi-mappers (1)

• miRBase does NOT ACCURATELY report number of times a read aligns to genome

• Multi-loci miRBase entries provide some information

0

40

80

120

160

200

0 20 40 60 80 > 100

Number of mapped locations

Num

ber

miR

s

Human multi-mappers #

miR-486

# Human miRbase entries mapped using bowtie aligner allowing all multi-mappers

Example

Page 14: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Multi-mappers (2)

• Multi-mapping rate increases as read length decreases.

• What should the minimum length miRNA read?

• Shortest length in miRbase is 17nt !

miR-133 family

miR-133a-1-3p uuugguccccuucaaccagcug

miR-133a-1-3p uuugguccccuucaaccagcug

miR-133b-1-3p uuugguccccuucaaccagcua

• Where do you assign multi-loci counts?

- Assign to each position?

- Assign fraction to each position?

- Intelligently assign to a position?

- Ignore?

miR-133a

miR-133b

Page 15: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Mismatches

• Sequencing Variantsi) Error in library prep

ii) Variants in reference genome

iii) Sequencer

• RNA editing

Type Enzyme Comment

A to I (G) ADAR Predominantly on pre-miRs

C to T Apobec Not identified yet?

Chawla et al., (2014) Nucleic Acids Research, 42 (8): 5245–5255Tomaselli et al., (2013) Int. J. Mol. Sci. 14, 22796-22816

Ohanian et al. (2013) BMC Genetics, 14:18

Page 16: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Aligners

• (Too) Many choices…

• Each aligner has a wide array of options with DIFFERENT default settings.

• Bowtie aligner provides error rate and multi-mapping control :

bowtie -p 4 -n 1 -l 21 --nomaqround -k 10 --best --strata --chunkmbs 256

Report up to 10 multi-mappers

Allow 1 mismatch in a length of 21nt

Fastq calibration dataset:

hsa-let-7f-5p_M_chr9_94176353_94176374_+#chrX_53557246_53557267_- 0 chr9 94176353

255 22M * 0 0 TGAGGTAGTAGATTGTATAGTT

• Available for ALL species present in miRBase, features include:

i) Each header defines miRBase mapping location

ii) Contains all miRbase entries with all single nucleotide mismatches

miRNA ID Mapping location #1 Mapping location #2

Page 17: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Non template additions (NTA)

i) Adenylation

ii) Uridylation

Koppers-Lalic et al., (2014), Cell Reports 8, 1649–1658

DETECTION METHODS:

• Aligners tend to softclip 3’ mismatches!!

• Remove adaptor- Hard trim (18nt)- Extend alignment. - Look for mismatch clusters at end of read.

<miRNA seq> + (A)n

<miRNA seq> + (T)n

Page 18: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Assigning miRNA counts

Mature miRNA analysis

i) 5’ isomirsii) 3’ isomirsiii) Non canonicaliv) Arm switchingv) Lengthvi) Editing

Cistronic Analysis(i) (ii)

Humphreys et al., 2013, NAR

Page 19: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

miRspring

• Small (<2MB) HTML document that replicates the miRNA aligned sequencing data.

• Needs NO internet connectivity.

• Provides visualization of sequence data + research tools == complete transparency.

http://miRspring.victorchang.edu.au

Humphreys D.T., and Suter C.M. Nucleic Acids Research 2013.

Page 20: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Cummulative distribution of miRNA reads

Sampling bias!

TissueAtlas

HeartKidneyLiverLungOvarySpleenTestes

ThymusBrain

Placenta

AGO IP

THP-1

ENCODE

HeLa S3A549

Ag04450Bj

Gm1287H1hescHepG2HuvecK562MCF7NheK

Sknshra• 73 miRspring documents

• 895 million sequence tags

• < 55 megabytes of disk space

In most cell lines and tissues the most

abundant miRNA should comprise < 35% of all

aligned miRNA sequences

OK ☺

Page 21: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Top 100 miRNAs typically:- 22nt long- Good correlation with miRBase

Page 22: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Conclusions

• Many challenges in miRNA-seq analysis

• Multi-mappers

• Mismatches

• Best practises…. be methodical

• Know the question you wish to address

• Know your species (reference/miRbase)

• Know your aligner

• Test your pipeline!

• Know what you are missing

• Quality control metrics/ visualisation

Page 23: Obstacles and challenges in the analysis of microRNA sequencing …bioinformatics.org.au/ws15/wp-content/uploads/ws14/sites/9/2012/1… · Kim et al., (2012) Molecular Cell 46, 893-895

Joshua Ho

Peter Szot

Catherine Suter

Diane Fatkin

Thomas Priess

St Vincent’s Hospital

Chris Hayward

Kavitha

Andrew Jabbour

If you would like a miRBase test data set for any species/reference combination

please don’t hesistate to contact me.

[email protected]

miRspring.victorchang.edu.au

- Fastq synthetic data sets

- Intelligently assign multi-mappers

- R objects