SMARTAR: small RNA transcriptome analyzer

SMARTAR:small RNA transcriptome analyzer

Geuvadis RNA analysis meeting April 16th 2012Esther Lizano and Marc Friedländer

Xavier Estivill labProgramme for Genes and Disease

Center for Genomic Regulation (CRG)

Overview of analysis

1: quality control

2: pre-processing

3: length profiling

8: visualization of data

9: summary of results

4 : genome mapping

5: annotation breakdown

6: miRNA de novo discovery

7: profiling of (iso-) miRNAs

1: quality control

• 1a: sequencing reads are trimmed to 36 nts (to reduce batch effect)

• 1b: PHRED quality scores are visualized

• software: FASTX package from Hannon lab

1: quality control (“good” quality example)

2: pre-processing

• 2a: homo-polymer filtering (if 33 or more of the nts are the same)

• 2b: quality filtering (if more than 50% of the nts have PHRED 10 or less)

• 2c: adaptor clipping (searches for first 8 nts of adapter, 1 mismatch allowed)

• 2d: identical sequences are collapsed (to FASTA format)

• 2e: length filtering (if clipped sequence is less than 18 nts)

• software: seqbuster, adrec, mapper.pl, FASTX.

3: length profiling (the “good”)

3: length profiling (the “bad”)

3: length profiling (the “ugly”)

• 4a: reads are mapped stringently to the genome

• 4b: potential miRNA hairpins are excised using mappings as guidelines

• 4c: for each potential miRNA hairpin a score is assigned based on:

– RNA structure and

– positions of sequenced RNAs

• software: miRDeep2

Figure from Friedländer et al.,Nature Biotech. 2008.

when miRNA precursors are processed by the Dicer protein, three products are released (a)

when mapped back to the precursor, these products will fall in a particular pattern (the ‘signature’)

in contrast, random degradation will not follow this pattern (b)

the fit of sequenced RNA to this model of biogenesis is scored probabilistically by miRDeep

5: profiling of miRNAs and isoforms

• reads are aligned to precursors allowing 1 mismatch and 1 addition in the 3’ end

• if a read locates within the boundary of the annotated mature miRNA (plus / minus 3 nts) it is annotated as such

• if a read maps equally well to more than one miRNA, it is counted towards all of them

• software: seqbuster, miraligner

seqbuster vs. mirdeep2 profiling

6: genome mapping

• 6a: reads are mapped to the genome, allowing one mismatch and an infinite number of mappings

• also included are the unassembled parts of the human genome and genomes of known human viral pathogens

• 6b: the mappings are converted to nucleotide-resolution intensities

• e.g. if a nucleotide is covered by a read with a single genome mapping and a read with ten genome mappings, it will be assigned an intensity of 1 + 0.1 = 1.1

• software: bowtie and custom source

• 7a: intensities are intersected with simple annotation (15 classes)

• 7b: intensities are intersected with detailed annotation (40 classes)

• 7c: intensities are intersected with individual gene annotations (>3 million classes)

• annotations are based on GENCODE version 8, but custom annotations are used for miRNAs, snoRNAs, rRNAs, LINEs, Alus, introns and anti-sense annotations.

• each nucleotide on each strand has exactly one annotation. This is resolved using a priority hierarchy

• software: custom source

7: simple annotation breakdown(the “good” and the “ugly”)

8: visualization (the miRNA “onco-cluster”)

miRNA precursor hairpins miRNA primary transcript

sRNA data peaks

9: summary• 10a: summary is given how many reads were:

– quality filtered

– length filtered

– not mapped to the genome

– successfully mapped to the genome

• typically around >80% of the reads are mapped to the genome.

• software: custom source

Current state of analysis

1: quality control

2: pre-processing

3: length profiling

8: visualization of data

9: summary of results

4 : genome mapping

7: profiling of (iso-) miRNAs

to do done

Outlook

• decide what data-sets should be discarded (>80% mapped, <25% miRNA)

• miRNA prediction and quantification (including custom allele-specific sequences)

• allele-specific gene expression

• correlate with genome variants (eQTL analysis)

• correlate with target mRNAs (TargetScanS v6 target predictions)

AcknowledgementsData generation:Esther LizanoTuuli Lappalainen

Analysis:Marc FriedländerEsther Lizano

Source development:Marc Friedländer Lorena Pantano

Allele-specific sequences:Tuuli Lappalainen

Supervisor:Eulàlia MartíXavier Estivill

Fellowships:MF acknowledges EMBOlong-term fellowship

SMARTAR: small RNA transcriptome analyzer

Documents

Transcript of SMARTAR: small RNA transcriptome analyzer

Research Article RNA-Seq Based Transcriptome Analysis of the …edoc.mdc-berlin.de/16383/1/16383oa.pdf · 2017. 3. 13. · Research Article RNA-Seq Based Transcriptome Analysis of

A guide to the whole transcriptome and mRNA …...Whole Transcriptome and mRNA NGS Services · Guidelines 6 If you want Exiqon to isolate the total RNA Exiqon offers an RNA isolation

Transcriptome-wide RNA sequencing analysis of rat skeletal … · 2017-11-16 · Transcriptome-wide RNA sequencing analysis of rat skeletal muscle feed arteries. I. Impact of obesity

Widespread RNA and DNA Sequence Differences in the Human Transcriptome

Transcriptome analysis methods for RNA-Seq data - CAMDA 2009

Comparative transcriptome and potential antiviral ... · Transcriptome de novoassembly and analysis Transcriptome de novo assembly for all samples was carried out by the RNA-Seq de

RNA sequencing analysis to capture the transcriptome landscape … · 2017-08-25 · RESEARCH ARTICLE Open Access RNA sequencing analysis to capture the transcriptome landscape during

An Introduction to RNA-Seq Transcriptome Profiling with iPlant.

Cellular/Molecular AnRNA ... · RNA sequencing (RNA-Seq) is a method that profiles the transcriptome by deep sequencing of isolated RNAs. ... contributes to transcriptome diversity

Whole transcriptome profiling reveals the RNA content of ... · PDF fileWhole transcriptome proﬁling reveals the RNA content of motor axons ... transcript alterations that accompany

RNA Quantification and Illumina Library Generation for RNA Seq RNA QC and...Transcriptome Seq Illumina TruseqRNA Exome 20ng-100ng Library Generation Methods. Illumina TruseqLibrary

RNA-Seq and RNA Immunoprecipitation Analyses of the ...RNA-Seq and RNA Immunoprecipitation Analyses of the Transcriptome of Streptomyces coelicolor Identify Substrates for RNase III

Transcriptome-wide analyses of CstF64–RNA interactions in global ...

Next-generation transcriptome assembly · transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq

Whole exome and transcriptome analyses integrated with … · analyzer (Agilent Inc.). The . RNA-seq. libraries were prepared as previously . described (18). For. whole . exome sequencing,

Introduction to RNA-Seq & Transcriptome Analysis Jessica Kirkpatrick PowerPoint by Casey Hanson RNA-Seq Lab | Jessica Kirkpatrick | 20151.

RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.

TOX680 Unveiling the Transcriptome using RNA- seq

RNA-Seq Transcriptome Proﬁling of the Queen Scallop … · 2019. 4. 2. · toxins Article RNA-Seq Transcriptome Proﬁling of the Queen Scallop (Aequipecten opercularis) Digestive

Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.