JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA...

29
JOBIM 3 July 2012

Transcript of JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA...

Page 1: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

JOBIM

3 July 2012

Page 2: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Chondrichthyans

Teleostomi

Page 3: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Scyliorhinus canicula (dog fish) Genome sequencing

Ongoing project with Génoscope started

3.5 Gbases,

Illumina paired-end sequencing, 32 x

Draft assembly : 3 449 662 contigs, N50 : 1 292 bp

Draft assembly Callorhinchus milii (elephant shark)

910 Mbases

Sanger + 454,1.4 x, 633 833 contigs, N50 : 1 466 bp

Draft assembly Leucoraja erinacea (little skate)

3.42 Gbases,

Illumina paired-end, 26 x, 2 962 365 contigs, N50 : 665 bp

Page 4: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Transcriptome project

Peptisan project

Sequencing done by Génoscope

Libraries for mRNA

Two normalised libraries (Non directional / directional)

Illumina paired-end sequencing (~412 M, ~316 M)

Poster on the transcriptome assembly (Pierre Pericard)

Two Small RNA libraries

Adult and Embryo libraries

Illumina paired-end sequencing 51 nt long

to identify miRNA : de novo identification

Page 5: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Small non coding RNA

post-transcriptional regulators of mRNA transcripts

Discovery of lin-4 in C.elegans in 1993

Pre-miRNA structure

miRNA conservation

miR-143 miRNA * loop miRNA

Zebrafish .....GAUCUACAGUCGUCUGGCCCGCGGUGCAGUGCUGCAUCUCUGGUCAACUGGGAGUCUGAGAUGAAGCACUGUAGCUCGGGAGGACAACACUGUCAGCUC.....

Medaka UGGUUCUGGUCCAUCUCUGCUGCCCAUGGUGCAGUGCUGCAUCUCUGGUCAGUUGAUAGUCUGAGAUGAAGCACUGUAGCUCGGGACGGAGGGCAGGAGUCUCAGUCUG

Xenopus ............UGUCUCCCAGCCCAAGGUGCAGUGCUGCAUCUCUGGUCAGUUGUGAGUCUGAGAUGAAGCACUGUAGCUCGGGAAGGGGGAAU..............

Human .GCGCAGCGCCCUGUCUCCCAGCCUGAGGUGCAGUGCUGCAUCUCUGGUCAGUUGGGAGUCUGAGAUGAAGCACUGUAGCUCAGGAAGAGAGAAGUUGUUCUGCAGC..

Mouse ......................CCUGAGGUGCAGUGCUGCAUCUCUGGUCAGUUGGGAGUCUGAGAUGAAGCACUGUAGCUCAGG........................

Rat .GCGGAGCGCC.UGUCUCCCAGCCUGAGGUGCAGUGCUGCAUCUCUGGUCAGUUGGGAGUCUGAGAUGAAGCACUGUAGCUCAGGAAGGGAGAAGAUGUUCUGCAGC..

Cow ......GCGUCCUGUCUCCCAGCCUGAGGUGCAGUGCUGCAUCUCUGGUCAGUUGGGAGUCUGAGAUGAAGCACUGUAGCUCGGGAAGGGAGAAGUUGUUCUGCAGC..

Pig .............GUCCCCCAGCCGGAGGUGCAGUGCUGCAUCUCUGGUCAGCUGGGAGUCUGAGAUGAAGCACUGUAGCUCGGGAAGGGAGA................

Opossum ......................CCCGAGGUGCAGUGCUGCAUCUCUGGUCAGUUGUGAGUCUGAGAUGAAGCACUGUAGCUCGGG........................

Lizard ...........AUGUCUCCCAGCCCAAGGUGCAGUGCUGCAUCUCUGGUCAGUUGUGAGUCUGAGAUGAAGCACUGUAGCUCGGGAAGGGAGGAAC.............

GAGUAAA UA UA GA U

5’ CCUUG G GCAGCACA AUGGUUUGUG UU U

||||| | |||||||| |||||||||| || G

3’ GGAAC C CGUCGUGU UACCGGACGU AA A

AUAAAAA UC UA GG A

miRNA*

miRNA

Page 6: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA
Page 7: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Illumina paired-end sequencing

Adult Embryo

High-Quality Sequences

17 – 27 nt

Data CleaningPRINSEQ Flashcutadapt

Sequences

< 17nt ; >27nt

no adaptors

rRNA, tRNA, ncRNA

Rfam

S. canicula

Draft GenomemiRBase 18.0

miRDeep2

Putative miRNA

Mature, Star, pre-miRNA

ValidationMIReNA CIDmiRNA

Triplet-SVM Conservation

miRNAPredmiRNA SVM

C. milii

GenomeR. erinacea

Genome

MFE

randfold PHDcleav

Page 8: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Illumina paired-end sequencing

Adult Embryo

High-Quality Sequences

17 – 27 nt

Data CleaningPRINSEQ Flashcutadapt

Sequences

< 17nt ; >27nt

no adaptors

rRNA, tRNA, ncRNA

Rfam

S. canicula

Draft GenomemiRBase 18.0

miRDeep2

Putative miRNA

Mature, Star, pre-miRNA

ValidationMIReNA CIDmiRNA

Triplet-SVM Conservation

miRNAPredmiRNA SVM

C. milii

GenomeR. erinacea

Genome

MFE

randfold PHDcleav

Cleaning

Prediction

Validation

Page 9: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

@PHOSPHORE_0144:8:1101:1512:2663#GGCUAC/1

UUCCCAAGACUGUGAAACCCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG

@PHOSPHORE_0144:8:1101:1699:2666#GGCUAC/1

AGGGCCCGGAUAGCUCAGUCGGUAG UGGAAUUCUCGGGUGCCAAGGAACUC

@PHOSPHORE_0144:8:1101:1503:2691#GGCUAC/1

GAAUACCAGGUGCAGUAGGCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG

@PHOSPHORE_0144:8:1101:1512:2663#GGCUAC/2

AAGGGUUUCACAGUCUUGGGAA GAUCGUCGGACUGUAGAACUCUGAACGUG

@PHOSPHORE_0144:8:1101:1699:2666#GGCUAC/2

CUACCGACUGAGCUAUCCGGGCCCU GAUCGUCGGACUGUAGAACUCUGAAC

@PHOSPHORE_0144:8:1101:1503:2691#GGCUAC/2

AAGCCUACUGCCCCUGGUAUUC GAUCGUCGGACUGUAGAACUCUGAACGUG

UUCCCAAGACUGUGAAACCCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG

CACGUUCAGAGUUCUACAGUCCGACGAUC UUCCCAAGACUGUGAAACCCUU

AGGGCCCGGAUAGCUCAGUCGGUAG UGGAAUUCUCGGGUGCCAAGGAACUC

GUUCAGAGUUCUACAGUCCGACGAUC AGGGCCCGGAUAGCUCAGUCGGUAG

GAAUACCAGGUGCAGUAGGCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG

CACGUUCAGAGUUCUACAGUCCGACGAUC GAAUACCAGGGGCAGUAGGCUU

• PRINSEQ (Schmieder and Edwards 2011 Bioinformatics)

• Cutadapt (Martin 2011. EMBnet.journal)

• Flash (Magoč and Salzberg 2011 Bioinformatics)

Illumina paired-end sequencing

Adult Embryo

High-Quality Sequences

17 – 27 nt

Data CleaningPRINSEQ Flashcutadapt

Sequences

< 17nt ; >27nt

no adaptors

rRNA, tRNA, ncRNA

Rfam

Cleaning

Page 10: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Embryo Adult All

Initial reads 89,766,100 81,179,402 170,945,502

Cleaned reads 82,325,424 65,651,400 147,976,824

Fre

qu

en

cy

Page 11: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Embryo Adult All

Initial reads 89,766,100 81,179,402 170,945,502

Cleaned reads 82,325,424 65,651,400 147,976,824

Fre

qu

en

cy

miR-143-3p

Page 12: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Illumina paired-end sequencing

Adult Embryo

High-Quality Sequences

17 – 27 nt

Data CleaningPRINSEQ Flashcutadapt

Sequences

< 17nt ; >27nt

no adaptors

rRNA, tRNA, ncRNA

Rfam

miRDeep2 : Friedländer et al. 2008 Nature Biotechnology

S. canicula

Draft GenomemiRBase 18.0

miRDeep2

Putative miRNA

Mature, Star, pre-miRNA

Prediction

Page 13: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Pre-miRNA Structural information:

miRNA and miRNA* information:

both miRNA and miRNA*

Overexpression of the miRNA vs miRNA*

Overhang (around 2 nt)

Sequence conservation

Page 14: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Modification to miRDeep2

Variability of the miRDeep2 related to randfold

Putative new miRNA

2445 new miRNA with score >= 0

1103 new miRNA with score >= 5 with 10% expected false positives

Page 15: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Conserved miRNA

170 miRNA identified similar to other species

15 rejected after manual inspection (2 with score > 5)

155 good known miRNA (21 with score < 5)

NNNUNNNNNANNNUNNNNNNCUNNNNNNNANNNNGANGNU

GUUNCAGGGNACANUCAACGNNGUCGGUGNGUUUNNUNCNA

|||N|||||N|||N||||||NN|||||||N||||NN|N|

CGANGUUCCNUGUNAGUUGCNNCAGCUACNCAAANNANGNU

NNNUNNNNNANNNUNNNNNN--NNNNNNN-NNNNG-NGNU

contig_452580_14256NNNNNNNNNNNNNNNNNNNNNNNNNNNNNAACAUUCAACGCUGUCGGUGAGUNNNNNNNNNNNNNNNNNACCAUCGACCGUUGAUUGUACC

NNNNNNNNNNNNNNNNNNNNGUUUCAGGGAACAUUCAACGCUGUCGGUGAGUUUGAUGCUAUUGGAGAAACCAUCGACCGUUGAUUGUACCUUGUAGC

GAAUUCUGCUUCGAAUGGUUGCUUCAGUGAACAUUCAACGCUGUCGGUGAGUUUGGAAUUAAAGUAGAAACCAUCGACCGUUGAUUGUACCCUGCGGCAACCACCGUCCU

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNAACAUUCAACGCUGUCGGUGAGUNNNNNNNNNNNNNNNNNACCAUCGACCGUUGAUUGUACC

oan-mir-181a (Ornithorhynch)

GCUU AA U U A U CU A GGAAU

CG UGGUUGCU CAG G ACA UCAACG GUCGGUG GUUU U

|| |||||||| ||| | ||| |||||| ||||||| |||| A

GC ACCAACGG GUC C UGU AGUUGC CAGCUAC CAAA A

UCCU -C C C A U -- - GAUGA

Page 16: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Comparison conserved miRNA with other species

C. milii (elephant shark) and L. erinacea (little skate)

131 identified in C.milii, 152 identified in L.erinacea, 154 altogether

Previously identified chondrichthyans miRNA (Heimberg et al. 2011)

104 S.canicula miRNA mapped on C.milii scaffolds

all 104 miRNAs identified in S. canicula

miRNA* loop miRNA

sca-mir-301 UGUCGGAGGCUCUGACGAUAUUGCACUACUGUACUCACAGU-UAAGCAGUGCAAUAGUAUUGUCAAAGCGUCAGGCACC

cmi-mir-301 UGUCGGAGGCUCUGACGAUAUUGCACUACUGUCCUCACCGU-UAAGCAGUGCAAUAGUAUUGUCAAAGCGUCAGGCAAC

ler-mir-301 UGUCGGGCGCUCUGACGAUAUUGCACUACUGUCCGCACAGCUAAAGCAGUGCAAUAGUAUUGUCAAAGCGUCAGGCACC

hsa-mir-301a ACUGCUAACGAAUGCUCUGACUUUAUUGCACUACUGUACUUUACAG-CUAGCAGUGCAAUAGUAUUGUCAAAGCAUCUGAAAGCAGG

mmu-mir-301a CCUGCUAACGGCUGCUCUGACUUUAUUGCACUACUGUACUUUACAG-CGAGCAGUGCAAUAGUAUUGUCAAAGCAUCCGCGAGCAGG

pma-mir-301a CUUGCAAGCCCCUGCUGGAGGCUCUGACACCAUUGCACUACUGUACGCAAUGG-UGAGCAGUGCAAUUGUAUUGUCAAAGCUUCCGUCGGUGAGCCCA

G G C --- A GU U

UGUC GA GCU UGACGAUAU UGCACU CU AC C

|||| || ||| ||||||||| |||||| || || A

ACGG CU CGA ACUGUUAUG ACGUGA GA UG C

A G A AUA C AU A

Page 17: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

miRBase miRNA not in data set

blastn of all miRBase miRNA against genome assembly

24 potential new conserved miRNA

2 identified by miRDeep2 but not identified as conserved

23444 522851

AAAG-UUCUGUCAUACACUCAGGCU UCAGUGCAUCACAGAACUUUGA

contig_3412856_61753 CUCGAGCUAAAG-UUCUGUCAUACACUCAGGCUGCAGAUACACA-AGGUCAGUGCAUCACAGAACUUUGAUUCGGG

rno-mir-148b UUGAGGUGAAG-UUCUGUUAUACACUCAGGCUGUGGCU-CUGA-AAGUCAGUGCAUCACAGAACUUUGUCUCG

cmi CCCAAGCUGAAG-UUCUGUCAUACACUCAGGCUGUAGCUAAUGG-AAGUCAGUGCAUCACAGAACUUUGACUCGAGAU

ler CUCAAGCCAAAGGUUCUGUCAUACACUUUGGCUCUGUCGCUGGG-AAGUCAGUGCAUGACAGAACUUUG

C C A CA GCAGA

CUCGAG UAAAGUUCUGU AU CACU GGCU U

|||||| ||||||||||| || |||| ||||

GGGCUU GUUUCAAGACA UA GUGA CUGG A

A C C -- AACAC

1425623 19236

UGAGAACUGAAUUCCAUGGGC UCCAUAGUAGACAGUUCUCCAG

contig_2512524_51750 UUCCCAGCUAUGAGAACUGAAUUCCAUGGGCUGGUUGCACACUUUAUUUC-UCAGUCCAUAGUAGACAGUUCUCCAGCUUGGCUGCU

gga-mir-146c-1 UUCCCAGCUCUGAGAACUGAAUUCCAUGGACUGGUUUCAAUUCCAUGCGU-UCAGUCCAUGGUAUUCAGUUCUCUAGCUUGGCUGC

cmi CCAGCUGUGAGAACUGAAUUCCAUGGGCUGGUCACGCAGUUUUCUUCCUCAGUCCAUAGUAGUCAGUUCUUCCGUUUGGCUGCU

ler UUCCUGGCUCUGAGAACUGAAUUCCAUGGGCUGGUUGUUCACAUUAUUUC-UCAGUCCAUAGUAG-CAGUUCUCCGGCUUGGCUGCU

---UUCCCA AU AAUUCC UUGCACA

GCU GAGAACUG AUGGGCUGG C

||| |||||||| |||||||||

CGA CUCUUGAC UACCUGACU U

UCGUCGGUU C- AGAUGA CUUUAUU

Page 18: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Illumina paired-end sequencing

Adult Embryo

High-Quality Sequences

17 – 27 nt

Data CleaningPRINSEQ Flashcutadapt

Sequences

< 17nt ; >27nt

no adaptors

rRNA, tRNA, ncRNA

Rfam

S. canicula

Draft GenomemiRBase 18.0

miRDeep2

Putative miRNA

Mature, Star, pre-miRNA

ValidationMIReNA CIDmiRNA

Triplet-SVM Conservation

miRNAPredmiRNA SVM

C. milii

GenomeR. erinacea

Genome

MFE

randfold PHDcleav

Validation

Page 19: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Several potential tools to validate miRNA predictions

MIReNA (Mathelier and Carbone 2010 Bioinformatics)

Microprocessor SVM : prediction of Drosha cleavage site (Helvik et al. 2007 Bioinformatics)

PHDCleav : prediction of Dicer cleavage site (http://www.imtech.res.in/raghava/phdcleav)

Randfold : mono / dinucleotide and markov randomisation (Bonnet et al. 2004, Bioinformatics)

Plant –miRNA pred : ath 82.65%, hsa 85.77% (http://nclab.hit.edu.cn/PlantMiRNAPred)

Evaluate tool accuracy

Robust control data set (Ritchie et al. 2012 BioInformatics)

129 positive controls, M.musculus miRNA with publications associated

682 negative controls from NGS sample but validated as non miRNA

Conserved miRNA identified with miRDeep

Page 20: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

miRNA validation tools

S.canicula Control data set

Sensitivity Specificity Sensitivity Specificity

miRDeep2 87,1% 86,7% 77,5% 99,1%

Plant-miRNAPred 94,8% 80,0% 97,7% 75,4%

MIReNA 91,6% 86,7% 95,3% 92,4%

RNA-fold (MFE) 95,5% 73,3% 96,1% 56,5%

Randfold d 999 94,2% 86,7% 87,6% 96,0%

Randfold m 999 81,3% 93,3% 71,3% 99,9%

Randfold s 999 96,1% 86,7% 95,3% 94,9%

triplet_SVM 92,9% 73,3% 86,8% 91,5%

Microprocessor SVM 57,4% 100,0% 64,3% 98,8%

PHDcleav 72,9% 86,7% 64,3% 68,9%

Blastn other spêcies 99,4% 46,7% 88,4% 92,8%

CIDmiRNA 93,5% 86,7% 93,8% 95,2%

Page 21: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

miRNA validation tools

S.canicula Control data set

Sensitivity Specificity Sensitivity Specificity

miRDeep2 87,1% 86,7% 77,5% 99,1%

Plant-miRNAPred 94,8% 80,0% 97,7% 75,4%

MIReNA 91,6% 86,7% 95,3% 92,4%

RNA-fold (MFE) 95,5% 73,3% 96,1% 56,5%

Randfold d 999 94,2% 86,7% 87,6% 96,0%

Randfold m 999 81,3% 93,3% 71,3% 99,9%

Randfold s 999 96,1% 86,7% 95,3% 94,9%

triplet_SVM 92,9% 73,3% 86,8% 91,5%

Microprocessor SVM 57,4% 100,0% 64,3% 98,8%

PHDcleav 72,9% 86,7% 64,3% 68,9%

Blastn other spêcies 99,4% 46,7% 88,4% 92,8%

CIDmiRNA 93,5% 86,7% 93,8% 95,2%

Page 22: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Combinations of all tools

Conserved miRNA passing all test : 83 / 155

Which criteria and threshold to apply ?

miRDeepPlant-miRNA

PredMIReNA

RNAfoldMFE

randfoldd 999

randfoldm 999

randfolds 999

TripletSVM

microSVM

PHDcleavBlastnother

species

CIDmiRNA

contig_2184464_47128 1,9 1 -1 -19,8 0,90% 2,20% 0,10% 1 -0,90 1,28 1 -1

contig_1435315_35146 50529,5 1 1 -33,2 0,10% 0,30% 0,10% 1 -0,04 0,25 1 1

contig_2147172_46625 4,7 -1 -1 -24,1 8,70% 14,80% 1,60% 1 -1,32 2,01 1 -1

contig_1446688_35335 25916,3 1 1 -35,3 0,10% 0,10% 0,10% 1 0,52 2,37 1 1

46910 1

contig_2147172_46625 UGUGGUGAACUAGCAGCACAUAAUGGUUUGUGAGUUGUAUGGAGAUGCAGGCCACAUUGUGCUGCCACAUGAAC

hsa-miR-15a CCUUGGAGUAAAGUAGCAGCACAUAAUGGUUUGUGGAUUUUGAAAAGGUGCAGGCCAUAUUGUGCUGCCUCAAAAAUACAAGG

GGUGAACUA UAA GA GU GAGUAAA UA UA GA U

GCAGCACA UGGUUUGU GUU A CCUUG G GCAGCACA AUGGUUUGUG UU U

|||||||| |||||||| ||| U ||||| | |||||||| |||||||||| || G

CGUCGUGU ACCGGACG UAG G GGAAC C CGUCGUGU UACCGGACGU AA A

CAAGUACAC UAC -- AG AUAAAAA UC UA GG A

Page 23: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Support Vector Machine

supervised learning methods that analyze data and recognize patterns, used

for classification and regression analysis

takes a set of input data and predicts, for each given input, which of two

possible classes forms the input : a non-probabilistic binary linear classifier.

Parameters : C-SVC type with polynomial kernel

What is the best combinations of tools ?

Try all the possible combinations of the validation tools / parameters

4095 combinations, 1 optimum with the minimum number of tools.

Page 24: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

MIReNA CIDmiRNATriplet-SVM

Blastn

miRNAPred

micro SVM

MFERandfold m PHDcleavRandfod d

Randfold s miRDeep2

Page 25: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

S.Canicula Control data set

Sensitivity Specificity Sensitivity Specificity

100,0% 93,3 % 96,9% 99,0%

MIReNA CIDmiRNATriplet-SVM

Blastn

miRNAPred

micro SVM

MFERandfold m PHDcleavRandfod d

Randfold s miRDeep2

Page 26: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Supplementary filters

Remapping of the reads on the hairpin with no mismatch

At least 5 sequences corresponding to mature miRNA

Remove prediction with fragments in the loop, 3’ and 5’ of pre-miRNA

968 potential new miRNA

155 conserved miRNA + 24 but not in dataset

Page 27: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Accurate miRNA set for S. canicula

Phylogenetic analysis

Chondrychtians specific genes

When Genome available

Analysis to be redone

Compare with CDS to remove contaminations

Target Prediction

Differential expression Adult / Embryo

piRNA identification

Page 28: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA

Transcriptome / small RNA studies was supported by environmental and

functional genomic CPER research initiative and PEPTISAN project funding

from Bretagne region. Thanks to FASTERIS and Genoscope for the RNA

libraries construction and sequencing.

Scyliorhinus canicula Genome sequencing project done in collaboration

with Genoscope.

To the organisers of Jobim

Thanks for your attention

Page 29: JOBIM 3 July 2012jobim2012.inria.fr/sources/slides/s14.pdf · Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA