Expression Genomics Laboratory -...

Post on 28-Jul-2020

0 views 0 download

Transcript of Expression Genomics Laboratory -...

Transcriptomics 101

Expression GenomicsLaboratoryhttp://www.expressiongenomics.org

Nicole Cloonann.cloonan@expressiongenomics.org

Winter School, 7th July 2009

I want to knowthe maths

I want the bestsoftware package

Align tags to the genome1

Measure gene expression2

Find mutations3

Find novel expression4

Assemble transcripts5

Win Nobel Prize6

0

5

10

15

20

25

1 2 3 4 5

June 2009

Presentation Outline

What is a transcriptome?

What can we learn from

studying it?

Introduction

Genomic tools for

transcriptomics.

Deriving biological

insight from transcriptomics.

Transcriptomics

What’s old is new again.

Double stranded protocols.

Strand specific protocols.

Sequencing the

transcriptome

Mapping and quantitation.

Genomic context of gene

expression.

SNPs, exon-junctions, novel

genes.

Working withRNA data

The problem of limited

information content.

Known and novel

expression.

IsomiRs.

Working withmiRNA data

pAATG

AUG AAA

TSS transcription start site pA polyadenylation signalprotein coding regions

AUG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS

All exonsfull length protein

Single transcript Geneone gene, one mRNA, one protein

pAATG

AUG AAA

TSS transcription start site pA polyadenylation signalprotein coding regions

AUG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS

All exonsfull length protein

Alternative splicingone gene, many mRNAs, many proteins

Intron retentionnew STOP codon, truncated protein, altered function

AUG AAA

Exon skippingchanged domain content, altered function

Exon skippingnew STOP codon, truncated protein, altered function

AUG AAA

AUG AAA

pAATG

AUG AAA

TSS transcription start site pA polyadenylation signalprotein coding regions

AUG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS

All exonsfull length protein

Alternative promotorsexpands coding output and gene control

Alt TSSdifferential control of gene, tissue specific or temporally specific, altered 5’ UTR content

AUG AAA

Alt TSSaltered 5’ UTR content, new ATG codon, expanded protein, altered function

AAAAUG

pAATG

AUG AAA

TSS transcription start site pA polyadenylation signalprotein coding regions

AUG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS

All exonsfull length protein

Alternative 3’ exonscan change ORF and 3’UTR content

Alternative 3’ exondifferent 3’UTR content, can change the ORF

AUG AAA

AUG AAAAlternative pAdifferent 3’UTR content

Transcriptionalcomplexity

pA

pA pApAATG ATG

AAAAAA

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

TSS

PASR TASRmiRNA

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

tiRNA

Presentation Outline

What is a transcriptome?

What can we learn from

studying it?

Introduction

Genomic tools for

transcriptomics.

Deriving biological

insight from transcriptomics.

Transcriptomics

What’s old is new again.

Double stranded protocols.

Strand specific protocols.

Sequencing the

transcriptome

Mapping and quantitation.

Genomic context of gene

expression.

SNPs, exon-junctions, novel

genes.

Working withRNA data

The problem of limited

information content.

Known and novel

expression.

IsomiRs.

Working withmiRNA data

Microarrays

PrepareMicroarray

Scan

Sample to study

ExtractRNA

LabelRNA

Hybridize

ShortProbes

Wnt4

Sox9Amh

+Female Male

Wnt4

Sox9

Amh

male gene expression

fem

ale

gene

exp

ress

ionMicroarray based

profiling

13.5dpc male vs female gonad

Gene expression predates morphology

Microarray basedprofiling

Gene expression patternscorrelate strongly with

prognosis

Nature Reviews Genetics 1; 48-56 (2000)MOLECULAR PROFILING OF HUMAN CANCER

Limitationsof microarrays

Limitedsensitivity

Limited dynamic range

Cross-hybridization

Detectionlimited by

probe design

Using arrays to surveytranscriptional complexity

pA

pA pApAATG ATGTSS TSS TSS

TSS

AAAAAA

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

microarray exon arrays exon-junction arrays

Presentation Outline

What is a transcriptome?

What can we learn from

studying it?

Introduction

Genomic tools for

transcriptomics.

Deriving biological

insight from transcriptomics.

Transcriptomics

What’s old is new again.

Double stranded protocols.

Strand specific protocols.

Sequencing the

transcriptome

Mapping and quantitation.

Genomic context of gene

expression.

SNPs, exon-junctions, novel

genes.

Working withRNA data

The problem of limited

information content.

Known and novel

expression.

IsomiRs.

Working withmiRNA data

pA

pA pApAATG ATGTSS TSS TSS

TSS

AAAAAA

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

3’ SAGE MPSS di-tag/mate-pair5’ SAGE

RNA sequencing

pA

pA pApAATG ATGTSS TSS TSS

TSS

AAAAAA

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

Shotgun sequencing

SQRL protocol

Step 1: pre-process RNA Step 2: 1st strand cDNA

Step 4: PCR amplification Step 3: template switch

AAAAAAAANNNNNN

FDV

NNNNNNFDV

NNNNNNFDV

CCC

CCC

CCC

rGGGRDV

rGGGRDV

rGGGRDV

NNNNNNFDV

CCCRDV

NNNNNNFDV

CCCRDV

NNNNNNFDV

CCCRDV

NNNNNN FDVCCCRDV

NNNNNN FDVCCCRDV

NNNNNN FDVCCCRDV

RDV FDV

RDV FDV

RDV FDV

AAAAAAAA

The SQRL protocolgenerates antisense

short-tags

LEGenD protocol

Step 1: pre-process RNA Step 2: Adaptor Ligation

Step 4: PCR amplification

NN RDVFDV

AAAAAAAA

RDVFDV NN

NN RDVFDV RDVFDV NN

NN RDVFDV RDVFDV NN

Step 3: 1st Strand cDNA

NN RDVFDV RDV

NN RDVFDV RDV

NN RDVFDV RDV

FDV

FDV

FDV

NN RDVFDV RDV

NN RDVFDV RDV

NN RDVFDV RDV

FDV

FDV

FDV

The LEGenD protocolgenerates sense

short-tags

Most commonRNAseq protocols

Step 1: pre-process RNA Step 2: 1st and 2nd strand cDNA

Step 4: PCR amplification Step 3: Adaptor Ligation

AAAAAAAANNNNNN

NNNNNN

NNNNNN

AAAAAAAA

The RNAseq protocolgenerates unstranded

short-tags

NNNNNN

NNNNNN

NNNNNNRDVRDV

FDVFDV

RDVRDV

FDVFDV

RDVRDV

FDVFDV

NNNNNN

NNNNNN

NNNNNNRDVRDV

FDVFDV

RDVRDV

FDVFDV

RDVRDV

FDVFDV

Presentation Outline

What is a transcriptome?

What can we learn from

studying it?

Introduction

Genomic tools for

transcriptomics.

Deriving biological

insight from transcriptomics.

Transcriptomics

What’s old is new again.

Double stranded protocols.

Strand specific protocols.

Sequencing the

transcriptome

Mapping and quantitation.

Genomic context of gene

expression.

SNPs, exon-junctions, novel

genes.

Working withRNA data

The problem of limited

information content.

Known and novel

expression.

IsomiRs.

Working withmiRNA data

pAATG

AUG AAA

TSS transcription start site pA polyadenylation signalprotein coding regions

AUG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS

Aligning tags to a reference genome

The fastest alignmentmethods are ungapped…but what about junctions?

Random fragmentationof RNA libraries

0

50

100

150

200

250

300

350

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86

Length of captured RNA

Freq

uenc

y

Short-tag length

Captured RNA Adaptor

Random fragmentationof RNA libraries

0

50

100

150

200

250

300

350

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86

Length of captured RNA

Freq

uenc

y

0

50

100

150

200

250

300

350

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86

Length of captured RNA

Freq

uenc

y

Short-tag length

Captured RNA Adaptor

Random fragmentationof RNA libraries

0

50

100

150

200

250

300

350

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86

Length of captured RNA

Freq

uenc

y

0

50

100

150

200

250

300

350

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86

Length of captured RNA

Freq

uenc

y

0

50

100

150

200

250

300

350

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86

Length of captured RNA

Freq

uenc

y

Short-tag length

Captured RNA Adaptor

Allowing for errorswhen mapping

Reference DNA

Amplificationerrors

Measurementerrors Polymorphisms Allelic specific

expression

RNA editing

Mappingerrors

Base changesin RNA sample

What is the minimumalignment length I should

use for my genome?

How many errors shouldI allow at the mapping

length used?

Unique- vs multi-mapping tags

Unique ≠ accurate

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

35.0.

035

.1.0

35.1.

135

.2.0

35.2.

135

.3.0

30.0.

030

.1.0

30.1.

130

.2.0

30.2.

130

.3.0

25.0.

025

.1.0

25.1.

125

.2.0

25.2.

125

.3.0

% sim % mum 5 % mum 10 % sims in known exons

Unique ≠ accurate

tagcgggatctctcgagagctcgcgat

tagcgggatctctcgacagctcgcgat

Chr A

Chr B

tctctcgacagct

1 MM

0 MM

tctctcgagagct0 MM

1 MM

Unique ≠ accurate

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

35.0.

035

.1.0

35.1.

135

.2.0

35.2.

135

.3.0

30.0.

030

.1.0

30.1.

130

.2.0

30.2.

130

.3.0

25.0.

025

.1.0

25.1.

125

.2.0

25.2.

125

.3.0

% sim % mum 5 % mum 10 % sims in known exonsIDEAL: match at thelongest possible length

RNA-MATEv1.1http://www.expressiongenomics.org/RNA-MATE/

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1 • perl/python coded• unix command line

(trialling web interface)• currently set up for PBS

managed cluster• GNU General Public

License v3.0• junction libraries

available

RNA-MATEv1.1http://www.expressiongenomics.org/RNA-MATE/

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1

n.cloonan@expressiongenomics.org

Configuration File

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1 tag_length=35,30num_mismatch=3mask=11111111111111111111111111111111111max_multimatch=10expect_strand=+rescue_window=10exp_name=tag_20000_F3chromosomes=chrM,chr2chr_path=/data/matching/hg18_fasta/junction=/data/libraries/hg18_junctions.fasta.catjunction_index=/data/libraries/hg18_junctions.fasta.indexoutput_root=/data/cxu/output_dir=/data/cxu/tag_20000_F3/raw_qual=/data/raw/tag20000.qualraw_csfasta=/data/raw/tag20000.csfastaquality_check=truescript_chr_start=/data/matching/chr_start.plscript_chr_wig=/data/matching/chr_wig.plf2m=/data/matching/f2m.plmapreads=/data/matching/mapreadsmaster_script=/data/matching/rna-mate-v1.0.pl

Quality Check(optional)

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1QVBasecalls

< 5 basecalls where QV <10

Pass Fail

25mers30mers35mers

Genome Alignment

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1 Recursive mapping strategy

Genome

Junction

Size

DiscoveryBin

MatchedData

Exon-junction libraries

pA pApAATG ATG

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

Multimapping Rescue(optional)

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1• Advantages:

• can add 5-20% more data• can interrogate genomic

regions previously hidden (genomic “black holes”)

• Disadvantages:• memory hungry• can slow down analysis

Multimapping Rescue(optional)

multi-mapping region

exons

genomic DNA positive strand expression

negative strand expression

Locus CLocus B

Locus A (predicted)

user defined window width

BED and bedGraphs

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1 • outputs:• strand specific bedGraphs

(wiggle plots)• strand specific start site

bedGraphs (for tag counting applications)

• “expected strand” junction BED file (for visualization)

• “unexpected strand”junction BED file (for assessing library directionality)

Genomic context ofexpression

Gene Symbol GRB7

Single nucleotide resolution coverage plot

Exon-exon junction usage

Known gene structure(exons and introns)

Alternative splicing

Novel exons or novel transcripts

Future Versions

Start

1

2

3

ReadConfiguration File

tag aligned?

check quality?

rescuemultimappers?

Quality Check

Genome/JunctionAlignment

Trim Tag

Select SingleMapping Tags

MultimappingTag Rescue

End4 Create WigglePlot Files

Create JunctionBED Files

Yes

Yes

Yes

No

No

No

RNA-MATEv1.1• Web browser interface• Integration of SNP analysis

pipeline for transcriptome• Allow the integration of

other mapping algorithms• Allow the integration of

other exon-junction identification strategies

Novel exon-junctiondiscovery (systematic)

pA pApAATG ATG

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

Pros:Computationally easy

Cons:Does not find all novel splicing

Novel exon-junctiondiscovery (de novo)

ACGATATGACACGTACAGTCAAATCGTACGATATTACACGTACATTCAAGTCGTACGATATTACACGCACAGTCAAGTCGTCGATATTACACGTCCAGTCAAGTCGTTATATTTCACGTACAGTCAAGTCGTTCGATATTAAACGTACAGTCAAGTCGTTCG

ATTGCACGTACAGTCAAGTCGTTCGGAATTACACGTACAGTCACGTCGTTCGGA

CACGTACAGTCAAGTCGTTCGGAACCTCACGTACCTTCAAGTCGTTCGGAACCT

ACGATATTACACGTACAGTCAAGTCGTTCGGAACCT consensus read

aligned reads

Non-matching tags

Create consensus read

remove adaptor sequence

Blat against genome

Pros:De novo

Cons:Requires high coverage

Novel exon-junctiondiscovery (TopHat)

pA pApAATG ATG

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

ATG AAA

http://tophat.cbcb.umd.edu

Pros:Very sensitive

Cons:Relies on reference

Substitutionsand micro-indels

Dnttip3 Arid4b

Map tags togenome

Align tagsto identify SNPs

Annotate SNPs(eg. SNP is

non-synonymousin an ORF)

Rank SNPs(eg. polyphen,

Canpredict)

Validate SNPs(eg. SangerSequencing)

ACGATATTACACGTACACTCAAGTCGTTCGGAACCTACGATATTACACGTACATTCAAATCGTACGATATTACACGTACATTCAACTCGTACGATATTACACGCACATTCAAGTCGT

CGATATTACACGTACATTCAAGTCGTTATATTTCACGTACATTCAAGTCGTTCGATATTAAACGTACATTCAAGTCGTTCG

ATTACACGTACATTCAAGTCGTTCGGAATTACACGTACATTCACGTCGTTCGGA

CACGTACATTCAAGTCGTTCGGAACCT-----------------T------------------ SNP call

Aligned Reads

Reference

“Diagnostic” features

AAA

protein coding regions AAA polyadenylationnon-coding regions spliced intron

AAA

AAA

AAA

A

B

C

D

Transcripts defined by Aceview (September 2007 release)

“Diagnostic” features

AAA

protein coding regions AAA polyadenylationnon-coding regions spliced intron

92.6% known transcripts have diagnostic features (covers 99.8% of loci)217127 diagnostic features covering 160156 individual transcripts from 65254 loci

AAA

AAA

AAA

A

B

C

D

Accuracy relies on the qualityof the gene models used.

Different gene models will givedifferent results from the samedata.

Differential GeneExpression

Microarray Sequencing

http://www.bioconductor.org/packages/2.3/bioc/html/edgeR.html

Caution on ShotgunRNAseq analysis

Oshlack and WakefieldBiol Direct. 2009; 4: 14.

Categories of genesthat are enriched forshort sequences:

•innate immunity•cell-cell communication•signal transduction

Presentation Outline

What is a transcriptome?

What can we learn from

studying it?

Introduction

Genomic tools for

transcriptomics.

Deriving biological

insight from transcriptomics.

Transcriptomics

What’s old is new again.

Double stranded protocols.

Strand specific protocols.

Sequencing the

transcriptome

Mapping and quantitation.

Genomic context of gene

expression.

SNPs, exon-junctions, novel

genes.

Working withRNA data

The problem of limited

information content.

Known and novel

expression.

IsomiRs.

Working withmiRNA data

nucleus cytoplasm

5’3’

RNA-Induced Silencing Complex(RISC)

5’ 3’ miRNAduplex

mRNA5’ AAAAAAAAAAAAAA 3’

5’T’

MicroRNAs can inhibittranslation of mRNAs

5’ 3’pri-miRNA 5’

3’ pre-miRNA

DroshaProcessing

DicerProcessing

AsymmetricalUnwinding

RISC-mRNAinteractionsTranslational

InhibitionmRNA

sequestrationmRNA

degradation

microRNAs are small

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35Length of small RNAs in the databases

Prop

ortio

n of

sm

all R

NA

s

miRNAs piRNAs

Matches to the Genome

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

17mers 18mers 19mers 20mers 21mers 22mers 23mers 24mers 25mers 26mers 27mers0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

17mers 18mers 19mers 20mers 21mers 22mers 23mers 24mers 25mers 26mers 27mers

1 colourspace mismatch

IsomiRs are commonN

umbe

r or i

dent

ical

read

s

Red = reads that start from a different location than Sanger reference

Blue = reads that start as the Sanger reference

Reference sequencemiRNAextension

5’….. …3’

31 miRNAs show the most abundant version starting from a different location than Sangre reference

Optimizing smallRNA mapping

Refining the reference set

Optimizing the mismatches

CAAAGUGCUUACAGUGCAGGUAGUUAAAGUGCUUAUAGUGCAGGUAG-AAAAGUGCUUACAGUGCAGGUAGCUAAAGUGCUGACAGUGCAGAU----AAAGUGCUGUUCGUGCAGGUAG-UAAGGUGCAUCUAGUGCAGAUA--

miR-17-5p :miR-20 :miR-106a :miR-106b :miR-93 :miR-18 :

UGUGCAAAUCUAUGCAAAACUGA-UGUGCAAAUCCAUGCAAAACUGA-UGUGCAAAUCCAUGCAAAACUGA-

miR-19a :miR-19b-1 :miR-19b-2 :

Optimizing smallRNA mapping

Refining the reference set

Optimizing the matching strategy

Optimizing the matching lengths

Optimizing the mismatches

0

20

40

60

80

100

120

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Length of tag when matching

Num

ber o

f mat

ches

Optimizing smallRNA mapping

Refining the reference set

Optimizing the matching strategy

Optimizing the matching lengths

Optimizing the mismatches

Filter spurious mappings

Comparisons withother platforms

r = 0.81 r = 0.80

Recursive or“vector stripping”

Start

1

2

ReadConfiguration File

tag aligned?

Decode Barcodes

Custom LibraryAlignment

Trim Tag

End

3 Count miRNAs

Yes

No

miRNA-MATEv1.0

4IdentifyAdaptor

5

tag aligned?

Custom LibraryAlignment

Discard TagNo

6Translate tobase-space

Yes

7SummarizeisomiR usage

8Create SequenceLogos

End

miRNA-MATEv1.0(recursive output)

Reference sequences

• miR and miR* sequences• miRBase (http://microrna.sanger.ac.uk/)

• The “dominant” miRNA appearing in the databases is determined to be the “functional” miRNA, and the other strand is “a non-functional by product”.

(Junk RNA – sound familiar?)

• The miR and miR* sequences can change

Recursive or“vector stripping”

Start

1

2

ReadConfiguration File

tag aligned?

Decode Barcodes

Custom LibraryAlignment

Trim Tag

End

3 Count miRNAs

Yes

No

miRNA-MATEv1.0

4IdentifyAdaptor

5

tag aligned?

Custom LibraryAlignment

Discard TagNo

6Translate tobase-space

Yes

7SummarizeisomiR usage

8Create SequenceLogos

End

Adaptor Identification

T010202100202312312333020XXXXXXXXXXXXXX

“adaptor sequence”

transition base (cleaved)

SREK captured small RNA

transition base

33020XXXXXXXXXXXXXX| | | | | | | | | | | |

Tags are matched against a referenceset of miRNAs that are not ambiguous.

Correlation withrecursive mapping

r = 0.94

miRNA-MATEv1.0(isomiR output)

Tissue specific isomiRism

Brain

Ovary

has-miR-181

Could be important to know about this for qRT-PCR validation

Changes in the startsite could change the“seed” region.

Presentation Outline

What is a transcriptome?

What can we learn from

studying it?

Introduction

Genomic tools for

transcriptomics.

Deriving biological

insight from transcriptomics.

Transcriptomics

What’s old is new again.

Double stranded protocols.

Strand specific protocols.

Sequencing the

transcriptome

Mapping and quantitation.

Genomic context of gene

expression.

SNPs, exon-junctions, novel

genes.

Working withRNA data

The problem of limited

information content.

Known and novel

expression.

IsomiRs.

Working withmiRNA data

Conclusions

Field is in its infancy, not all challenges have been solved. We need more mathematical and statistical input!

RNAseq is a powerful way to increase the sensitivity and usefulness of global gene expression surveys.

Be cautious with your analysis. Think and plan your analysis before you get into the lab.

I want a Nature paper

=+ ≠Rubbish in, rubbish out.

Medical Genomics

The End

Expression GenomicsLaboratoryhttp://grimmond.imb.uq.edu.au