Post on 05-Aug-2020
Obstacles and challenges in the analysis of microRNA sequencing data
(miRNA-Seq)
David Humphreys
Genomics core
Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian
The ABCs about miRNAs (Annotation, Biogenesis, Curation)
www.mirbase.org• Mature fasta file• Stem loop fasta file• Gff (genome coordinate file)
miRNA-Seq applications
Read length covers entire mature transcript
Discovery
- Novel miRNAs
- Isoforms
- Biogenesisiii ) non canonical processingiv) Strand selectionv) length/ non-template additions
Quantification
- Differentially expressed miRNAs
- Differential processing
Experimental design
• Sample selection• Species, replicates
• RNA extraction
• Library preparation
Kim et al., (2011)Molecular Cell 43, 1005-1014
Low confluence = 500,000 cellsHigh confluence = 800,000 cells
Cell number(L) = 200,000(H) = 800,000
RNA extraction
ColumnLiquid Bead
Prep time ++ ++++ +++
miRNA purification +++ ++++ ++++
Recovery ++++ +++ +++
Limitations/pitfalls Low input miRNA bias
Early protocols no miRNA ???
Kim et al., (2012)Molecular Cell 46, 893-895
NO change!!
Rati
o 1
41/2
00c
Down regulated miRNAs:
141, 29b , 21, 106b, 15a, 34a
• Most susceptible:
- Low GC content,
- 2ndary structure
• Small RNA ppt with longer RNA
RNA quantification and integrity
Nano drop Qubit Agilent
seqanswers.com/forums/showthread.php?t=21280
WARNING!- Accuracy poor below 50ng/ul- Careful of concentrations > 1ug/ul
WARNING!- Known biases in quantifying
ssRNA < 50ng/ul
230 260 280
WARNING!- Quantification only accurate in
the defined range (read manual)
Assays specific for DNA/RNA Quantitate sizeCan detect salt & other contaminants
Absorb
ance
Library prep kit comparison
Sample prep
Adaptor ligation
RT(Reverse
Transcription)
PCR
miRNAP- -OH
miRNA
i) Hybridisation
ii) Ligation
iii) DenaturationSequential Ligation
miRNA miRNA
# Hafner et al., (2011) RNA 17(9), 1-16
# Sequence# Temperature# Incubation times
# PCR cycles … OK
# Input amount
# PH, buffers/salts/ATP
Summary
• Sample selection• Species, replicates
• RNA extraction• Use same method for all preps
• Quantify (2 methods)
• Assess integrity
• Library preparation• Consistent input
• Consistent ligation conditions (time/temperature)
• Use same kits
miRNA-Seq Bioinformatics
(Trim - ALIGN – Report)
Anscombe’s Quartet
• Maths is a tool for analysis.• You can blindly ignore biases and errors in data sets.
- mean, stdev, variance, correlation are the same!
Image from wikipediahttps://en.wikipedia.org/wiki/Anscombe%27s_quartet
Challenges
Multimappers
Mismatches
AlignersSharing data
• Length of a sequence read covers entire microRNA transcript
• Upstream bias will have impacts on analysis
Sample preparation
SequencingLibrary preparation
Clonal amplification
Bioinformatics
Normalisation
Differential expression
Feature counting
Visualisation
Choice of reference?
Genome miRBase stem-loop
Better discovery
Possible incorrect/loss of mappings Forced (biased) mapping
Faster, less complicated.Slower, computationally restrictive?
Limited discovery
miR-486
Multi-mappers (1)
• miRBase does NOT ACCURATELY report number of times a read aligns to genome
• Multi-loci miRBase entries provide some information
0
40
80
120
160
200
0 20 40 60 80 > 100
Number of mapped locations
Num
ber
miR
s
Human multi-mappers #
miR-486
# Human miRbase entries mapped using bowtie aligner allowing all multi-mappers
Example
Multi-mappers (2)
• Multi-mapping rate increases as read length decreases.
• What should the minimum length miRNA read?
• Shortest length in miRbase is 17nt !
miR-133 family
miR-133a-1-3p uuugguccccuucaaccagcug
miR-133a-1-3p uuugguccccuucaaccagcug
miR-133b-1-3p uuugguccccuucaaccagcua
• Where do you assign multi-loci counts?
- Assign to each position?
- Assign fraction to each position?
- Intelligently assign to a position?
- Ignore?
miR-133a
miR-133b
Mismatches
• Sequencing Variantsi) Error in library prep
ii) Variants in reference genome
iii) Sequencer
• RNA editing
Type Enzyme Comment
A to I (G) ADAR Predominantly on pre-miRs
C to T Apobec Not identified yet?
Chawla et al., (2014) Nucleic Acids Research, 42 (8): 5245–5255Tomaselli et al., (2013) Int. J. Mol. Sci. 14, 22796-22816
Ohanian et al. (2013) BMC Genetics, 14:18
Aligners
• (Too) Many choices…
• Each aligner has a wide array of options with DIFFERENT default settings.
• Bowtie aligner provides error rate and multi-mapping control :
bowtie -p 4 -n 1 -l 21 --nomaqround -k 10 --best --strata --chunkmbs 256
Report up to 10 multi-mappers
Allow 1 mismatch in a length of 21nt
Fastq calibration dataset:
hsa-let-7f-5p_M_chr9_94176353_94176374_+#chrX_53557246_53557267_- 0 chr9 94176353
255 22M * 0 0 TGAGGTAGTAGATTGTATAGTT
• Available for ALL species present in miRBase, features include:
i) Each header defines miRBase mapping location
ii) Contains all miRbase entries with all single nucleotide mismatches
miRNA ID Mapping location #1 Mapping location #2
Non template additions (NTA)
i) Adenylation
ii) Uridylation
Koppers-Lalic et al., (2014), Cell Reports 8, 1649–1658
DETECTION METHODS:
• Aligners tend to softclip 3’ mismatches!!
• Remove adaptor- Hard trim (18nt)- Extend alignment. - Look for mismatch clusters at end of read.
<miRNA seq> + (A)n
<miRNA seq> + (T)n
Assigning miRNA counts
Mature miRNA analysis
i) 5’ isomirsii) 3’ isomirsiii) Non canonicaliv) Arm switchingv) Lengthvi) Editing
Cistronic Analysis(i) (ii)
Humphreys et al., 2013, NAR
miRspring
• Small (<2MB) HTML document that replicates the miRNA aligned sequencing data.
• Needs NO internet connectivity.
• Provides visualization of sequence data + research tools == complete transparency.
http://miRspring.victorchang.edu.au
Humphreys D.T., and Suter C.M. Nucleic Acids Research 2013.
Cummulative distribution of miRNA reads
Sampling bias!
TissueAtlas
HeartKidneyLiverLungOvarySpleenTestes
ThymusBrain
Placenta
AGO IP
THP-1
ENCODE
HeLa S3A549
Ag04450Bj
Gm1287H1hescHepG2HuvecK562MCF7NheK
Sknshra• 73 miRspring documents
• 895 million sequence tags
• < 55 megabytes of disk space
In most cell lines and tissues the most
abundant miRNA should comprise < 35% of all
aligned miRNA sequences
OK ☺
Top 100 miRNAs typically:- 22nt long- Good correlation with miRBase
Conclusions
• Many challenges in miRNA-seq analysis
• Multi-mappers
• Mismatches
• Best practises…. be methodical
• Know the question you wish to address
• Know your species (reference/miRbase)
• Know your aligner
• Test your pipeline!
• Know what you are missing
• Quality control metrics/ visualisation
Joshua Ho
Peter Szot
Catherine Suter
Diane Fatkin
Thomas Priess
St Vincent’s Hospital
Chris Hayward
Kavitha
Andrew Jabbour
If you would like a miRBase test data set for any species/reference combination
please don’t hesistate to contact me.
d.humphreys@victorchang.edu.au
miRspring.victorchang.edu.au
- Fastq synthetic data sets
- Intelligently assign multi-mappers
- R objects