Introduction to RNA-Seq in GeneSpring NGS Software · Introduction to RNA-Seq • In a few years,...

Post on 01-Aug-2020

7 views 0 download

Transcript of Introduction to RNA-Seq in GeneSpring NGS Software · Introduction to RNA-Seq • In a few years,...

Learn more at www.genespring.com

Introduction to RNA-Seq in GeneSpring NGS Software

Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies

Introduction to RNA-Seq

• In a few years, massively parallel cDNA sequencing, or RNA-seq,

has allowed many advances in the characterization and

quantification of transcriptomes.

• Rapidly decreasing sequencing cost and massively-parallel

sequencing technologies has resulted in a dramatic increase in

the quantity of data that needs to be analyzed.

• Therefore, the need of the day is to build a tool that will enable

the analysis and integration of data produced on multiple

platforms and using multiple methods.

• Agilent has designed NGS analysis in GeneSpring keeping in

mind the biologist, who is interested in answering REAL biological

questions and does not want to become a bioinformatics expert

just to do their work.

GeneSpring

NGS

Agilent SureSelect Target Enrichment

5 Map Reads to the

Ref. Genome 6 Quality Control on

Mapping 7 Detect

SNPs/InDels or

Diff. Spliced

Genes

8 Find Biological

Relevance for

your Results

GeneSpring NGS Provides Downstream

Analysis of Next-Gen Sequencing data

Data File (Reads + Quality)

Control Software

Data File (Reads + Quality)

ELAND/BIOSCOPE/BWA…

Reads aligned to genome

Reads aligned to genome

GeneSpring NGS

Primary Analysis Tertiary Analysis Secondary Analysis

FASTQ, … BAM, …

FASTQ

Questions we can seek to answer using

GeneSpring RNA-Seq Workflow are

What are the differentially expressed genes?

What are the differentially spliced genes?

Are there any SNPs in the transcriptome?

Can we identify gene fusion events?

Can we identify novel genes, novel exons, and novel splice junctions?

6

Import Data and Annotations

Download Human build HG18 from the

Agilent Server

8

Open Human

“tree” Select hg18 and

Homologene

Groups

Click [UPDATE]

to start the

download

Annotations files

can be quite

large

Baits and Targets for SureSelect Catalog

Kits are Pre-loaded

9

List of Agilent

SureSelect Catalog

Kits

Creating a new Experiment

10

Click “New

Experiment” icon to

start new experiment

Or select “New

Experiment” from

Project menu

Create New GeneSpring NGS Experiment

Provide some useful and descriptive name

Be sure to select “NGS” as the Analysis type

Importing Data: Choosing Metadata New organisms supported on demand

Indicate organism,

build and transcript

model

Prepackaged

annotations

available for a

variety of

organisms

Support for Illumina,

Life Technology, 454

Roche

Support for single end,

paired end, and mate

pair protocols

Provide Information on Sequencing platform and

library layout for SureSelect specific kits

13

Be sure to select the

previously loaded

Target region

GeneSpring NGS

Organization Of The Windows

14

Project and Experiment Navigator

Workflow Browser

Region View

15

Quality Control

Perform QC on reads and filter anomalous reads View reads by tile/lane and remove reads in anomalous tiles

Base

qualities in

the same

tiles show

bad

Perfect match

reads and too

many

mismatch

reads in this

tile.

Quality Inspection

Open Quality Control Manager

Press Compute to

calculate the Library

QC metrics

Press Compute to

calculate the Library

QC metrics

July 11 Page 18

QC Manager feature in SureSelect experiments

Off-target reads

can be removed

to focus analysis

Targeted Regions

(SureSelect Baits)

Off-target reads

can be removed

to focus analysis

Determine the expression values for

each gene

Run Quantification to determine raw gene counts Which reads contribute to a gene’s count?

These reads

do NOT

contribute

Only Reads

overlapping exonic

regions contribute to

the read count

Multiply mapping

reads contribute

fractionally to the

count

Quantification

Quantify Genes, Transcripts and Exons

Reverted back to the All

Aligned Reads list to

establish a base line. Feel

free to try a different

(filtered) read list

Unchecked will only count

reads falling completely

inside an exon.

New genes and exons are

discovered using

conservation data

(Conservation track in

Annotations)

Filter genes by RPKM: Results of Filtering

Profile plot of

genes that pass

the filter criteria

Profile plot of

genes that pass

the filter criteria

Handling overlapping genes Which reads contribute to a gene’s count?

These reads contribute to both

genes except for ABI data

which is strand specific

Overlapping gene on

negative strand

Gene on positive

strand

Determine expression values normalized across samples

Scatter Plot between Two Replicates

Detection of differentially expressed genes

Output of Differential Expression Analysis

P-values,

Corrected p-

values and fold

changes

Volcano Plot

Determine differentially spliced genes

detected

34

Identifying Differentially Spliced Genes

Compute the proportion of a

gene’s count that can be ascribed to

a particular transcript.

If the proportion for a particular

transcript changes substantially across conditions, the gene

is said to be differentially spliced.

The Challenge: Deconvoluting Transcript Read Counts

Which of the 4

transcripts do

these reads

come from?

Differential Splicing Analysis

View Results in Gene View

Ensure Splicing

Analysis Results

Entiy list is selected

Click “Gene

View” icon to

show Gene

View

Differential Splicing Analysis

Gene View

This transcript

is expressed

less in Tumor

Gene’s

RPKM

4 Transcript

RPKMs

4 Transcript

RPKMs

4 Transcript

RPKMs

Transcript

RPKM

This transcript

is expressed

more in Tumor

Possible

New Exon

Possible

New Exon

Determine SNPs be determined in the

transcriptome

GeneSpring NGS has a built-in SNP calling algorithm

Set Filters for SNP

statistical significance

Set filters for min number of

overlapping reads and min

number of overlapping

variant reads

GeneSpring NGS calls transcript effects for

each SNP and allows filtering of SNPs based on

these effects

Types of effects

predicted Change in

Amino Acid for

Non-

synonymous

SNPs

Viewing SNPs in the Genome Browser

Color-coded

indicator for a

Homozygous

SNP

Known in

dbSNP

GeneSpring

NGS SNP Call

In a Repeat

Region

Determine chimeric transcripts or

fusion genes

Identify Fusion Genes

In a K562 Leukemia cell line, GeneSpring NGS confirms the well-known

BCR-ABL1 gene fusion.

Filters set on the

Genome

Browser to show

only trans-

located reads

Several reads pairs for

the BCR gene on

chr22 with mates

translocated to the

ABL1 gene on chr9

The

corresponding

paired reads for

the ABL1 gene on

chr9

Detection of novel genes, exons, and

splice junctions

Identify Novel Exons and Genes

In a mouse myoblast study, GeneSpring NGS determines

a new exon for the FHL3 gene

Read clumps not aligned

with a known exon

Novel exon

determined by

GeneSpring NGS,

probably a new

transcription

start site

Add exon to gene if close to or within the

gene, otherwise call it a new

gene

Identify Novel Splice Junctions In a brain tissue expression study, GeneSpring NGS determines a new splice

junction in the DTX3 gene when considering only Refseq transcripts; this novel

splice junction is corroborated by a UCSC transcript.

Solid lines show

spliced reads

connecting the

1st and 3rd exons

of the RefSeq

transcript

The

corresponding

novel splice

junction found by

GeneSpring

NGS

Indeed, a known

UCSC transcript

that is not

present in

RefSeq validates

this discovery

49

Biological Contextualization

Pathway Analysis

Agilent GeneSpring NGS for SureSelect

Display the Results on a Pathway

Questions we can seek to answer using

GeneSpring RNA-Seq Workflow

What are the differentially expressed genes?

What are the differentially spliced genes?

Are there any SNPs in the transcriptome?

Can we identify gene fusion events?

Can we identify novel genes, novel exons, and novel splice junctions?

Summary

•Differential expression and splicing analysis

•Novel gene, exon and alternative splicing

discovery

•Gene Fusion Analysis

•SNP & InDel discovery and annotation with

dbSNP

Summary in General

•GeneSpring NGS supports both SureSelect RNASeq

experiments as well as RNA Seq experiments that don’t

use Sure Select

•The workflow steps in GeneSpring NGS application are

application specific and changes based on whether you

are analyzing a DNA-SEQ or RNA-SEQ experiment.

•It is possible to integrate data produced on multiple

platforms and using multiple methods in the same project

in GeneSpring.

•Multiple different visualization tools available to query the

data.

Thank you

dipa@strandsi.com

informatics_support@agilent.com

55

http://www.AVADIS-NGS.com