Download - Iden%ﬁcaonoflongnon.codingRNAsin livestock*species** · Iden%ﬁcaonoflongnon.codingRNAsin livestock*species** * * OanaPalasca * February15th,2018*

Iden%fica%on of long non-‐coding RNAs in livestock species

Oana Palasca February 15th, 2018

Overview

•  hGp://www.faang-‐europe.org •  Community effort to establish a database of ncRNAs in

livestock species •  Annota%on based on experimental evidence approach (RNA-‐

Seq) •  Ini%ated as a hackathon in October 2016

2

Background

Long non coding RNAs •  Capped, polyadenylated, alterna%vely spliced transcripts •  Roles in regula%on of transcrip%on/transla%on, chroma%n modifica%on

etc Challenges in iden%fica%on of lncRNA genes

•  LiGle sequence conserva%on – difficult to predict “de novo” •  Low expression levels – difficult to dis%nguish from transcrip%onal noise •  Low consistency between biological replicates

3

McMullen et al, Clinical Science, 2016

Background Current annota%on status, Ensembl 91

4

0

5000

10000

human mouse pig cow sheep horse chicken

LincRNAs type

genestranscripts

0

50000

100000

150000

human mouse pig cow sheep horse chicken

Prot

ein

codi

ng

typegenestranscripts

Workflow

Data collec%on and filtering

1


Ø  Data obtained from the European Nucleo%de Archive

Ø  Cow, horse, pig, sheep, chicken Ø  Selec%on criteria: Illumina, paired-‐ended,

stranded, >100 bp

1

Patro et. Al, Nature Methods, 2017 1




stranded, >100 bp

Ø  Quality check – FastQC Ø  Strandedness check – Salmon

Ø  15% samples unstranded Ø  10% samples with first read mapping to

the forward strand

1





stranded, >100 bp

Ø  Quality check – FastQC Ø  Strandedness check – Salmon

Ø  15% samples unstranded Ø  10% samples with first read mapping to

the forward strand

Ø  ~ 900 samples, 32 Brenda %ssue ontology terms

Ø  Muscle and brain available for all species

1


Workflow

Comparison of pipelines Star /Cufflinks (SC) vs. Hisat2/String%e (HS)

Input: Ø  3 samples from chicken (kidney, liver, heart – pooled) Ø  5 samples from cow (heart, cerbral cortex, spleen, liver, kidney) Ø  With/without annota%on

12

1

Dobin et. al, 2013 1

2

Trapnell et.al, 2010 2 3 Pertea et.al, 2016

3

Comparison of pipelines Total number of transcripts, per %ssue and by merging all %ssues

Size of the transcript sets obtained by running the two pipelines, COW

0

25000

50000

75000

kidney liver heart merged

trans

crip

ts pipelineSC annotSC w/o annotHS annotHS w/o annot

0

20000

40000

60000

80000

heart c.cortex spleen liver kidney merged

trans

crip


Size of the transcript sets obtained by running the two pipelines, CHICKEN

0

25000

50000

75000


trans

crip


Comparison of pipelines Impact of the merging step string%e-‐merge on the SC output and cuffmerge on the HS output

Size of the transcript sets obtained by running the two pipelines, COW

0

25000

50000

75000

kidney liver heart merged merge_swap

trans

crip


Size of the transcript sets obtained by running the two pipelines, CHICKEN

0

25000

50000

75000


trans

crip


0

20000

40000

60000

80000

heart c.cortex spleen liver kidney merged merge_swap

trans

crip


Comparison of pipelines Recovery rate compared to annota%on

Number of transcripts exactly overlapping the annota%on

0

10

20

30

chicken cow%

from

pre

d. tr

ansc

ripts

pipelineSC w/o annotHS w/o annot

0

2000

4000

6000

chicken cow

num

ber o

f tra

nscr

ipts


0

10

20

30

chicken cow

% fr

om p

red.

tran

scrip

ts


Percentage of transcripts exactly overlapping the annota%on from the total transcripts predicted

Comparison of pipelines Assessment of capacity of reconstruc%ng lncRNAs in mouse

Data •  30 RNA-‐Seq mouse samples from ENCODE/CSHL, illumina, paired-‐ended, 100bp •  Gencode mouse annota%on down-‐sampled such as to contain a set of genes and

transcripts similar to e.g. cow (e.g. 20.000 PCGs orthologous with cow + 3000 randomly selected PCGs + 2000 miRNas/snoRNAs)

16




Pipelines to test •  STAR>Cufflinks-‐>cuffmerge/string%e-‐merge-‐>FEELnc •  HiSat2-‐>String%e-‐>string%e-‐merge/cuffmerge-‐>FEELnc

17




Pipelines to test •  STAR>Cufflinks-‐>cuffmerge/string%e-‐merge-‐>FEELnc •  HiSat2-‐>String%e-‐>string%e-‐merge/cuffmerge-‐>FEELnc Output •  Sensi%vity and specificity for predic%ng lncRNAs and PCGs, based on the exact

overlap with the remaining set of the annota%on

18

Further work

•  Synteny analysis •  Correla%on of expression levels across organisms •  Structural predic%ons

•  Public database, connected to exis%ng repositories, e.g. EBI

19

Acknowledgements

•  Daniel Fischer •  Sarah Djebali •  Chris%an Anton •  Alicja Pacholewska •  Nadezhda Doncheva •  Thomas Derrien •  Lel Eory •  Frank Panitz •  Magda Mielczarek •  Christa Kuhn •  Alan Archibald •  Jan Gorodkin

20

Download - Iden%ﬁcaon*of*long*non.coding*RNAs*in* livestock*species** · Iden%ﬁcaon*of*long*non.coding*RNAs*in* livestock*species** * * OanaPalasca * February*15th,*2018*

Download - Iden%ﬁcaonoflongnon.codingRNAsin livestock*species** · Iden%ﬁcaonoflongnon.codingRNAsin livestock*species** * * OanaPalasca * February15th,2018*