Next Generation Sequencing Data Analysis with BioHPC · 2019-06-27 · Next Generation Sequencing...

Post on 02-Aug-2020

3 views 0 download

Transcript of Next Generation Sequencing Data Analysis with BioHPC · 2019-06-27 · Next Generation Sequencing...

Next Generation Sequencing Data Analysis with BioHPC

1Updated for 2015-04-15

Genomic, transcriptomic sequencing now

commonplace in projects. Now very cheap!

UTSW McDermott Core typical pricing:

Whole Genome PE100 $7,500

Whole Transcriptome PE100 $875

Most common experiment across the University:

Use RNA-Seq to identify gene expression

changes in response to a stimulus / caused by

a disease.

Next Generation Sequencing

2

Let’s focus on this today – but you can do

other things on our systems!

Typical RNA Seq Workflow

3

A BPrepare/ Obtain Samples for different conditions

Extract RNA and prepare library for sequencing

Run library on Illumina sequencer

Obtain short-read sequences

Typical RNA Seq Workflow – Data Analysis

4

Check quality and/or filter reads

Align to the genome or transcriptome

Quantify transcript abundance across conditions

Identify significant differences in expression between conditions

Powerful processing and lots of storage!

72 x 32/48 core nodes to run mapping that can take days on a

PC.

>1Petabyte of storage. No need to shuffle data between drives

when working on large projects.

Storage is fast. Best at the things NGS analysis does most

(accessing large files sequentially)

Why Use BioHPC?

5

Tools to help you!

NGS Pipeline Standard workflows with little effort. Best for beginners.

Galaxy Powerful environment with many tools, workflow designer.

Batch Scripts Various NGS tools available as modules on the cluster, for expert users.

This is NOT an attempt to replace comprehensive services that a sequencing core

provides, where data analysis is performed as part of the sequencing service.

But…

–Now common to need to integrate existing public data into projects

–Common to obtain data from collaborators, outside facilities

–Labs often have students/postdocs who have received NGS analysis training

–More flexibility – many tools available to create complex pipelines

But the sequencing core does it for me….

6

Use our services with caution.

You *should* have a basic understanding of the limitations of the techniques.

Option 1 - BioHPC NGS Pipeline

7

BioHPC Portal -> Cloud Services -> NGS Pipeline (ngs.biohpc.swmed.edu)

Common workflows, made easy.Currently RNA-SEQ Differential Expression Analysis

Option 2 - BioHPC Galaxy Service

8

BioHPC Portal -> Cloud Services -> Galaxy (galaxy.biohpc.swmed.edu)

Reproducible workflows, with many available tools, via the web.Widely used by many institutions.

Option 3 - Modules and Sequence Data / Indices

9

module avail/project/apps_database/iGenomes

Common NGS tools and Illumina iGenome databases are available on the clusterExperts can write their own pipelines using cluster sbatch jobs

Today we are going to…

10

Follow a simple and real-world RNA-SEQ differential expression analysis using:

• The BioHPC NGS Pipeline

• BioHPC Galaxy Service

Try it out with your own data!

Email biohpc-help@utsouthwestern.edu with questions

Bring your problems to the BioHPC drop-in coffee session next week!

A ‘toy’ example I can show you in real time (hopefully!)

75,000 reads from chr19, extracted from a larger study

2 Conditions – brain tissue vs adrenal tissue

What’s the difference in expression for the limited number

of transcripts we can see in this data?

Courtesy Galaxy Project, Illumina Body Map:

https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise

Example 1 – Brain vs Adrenal

11

A real study from a lab I used to work in.

Public data downloaded via EMBL-EBI ArrayExpress.

We’ll take 4 conditions, all samples MCF7 cells subjected to hypoxia:

– Control (scrambled siRNA)

– HIF1A knock-down by siRNA

– HIF2A knock-down by siRNA

– HIF1A + HIF2A double knock-down by siRNA

2 replicates for each condition. Illumina HiSeq platform.

Example 2 – HIF1α, HIF2α Single and Double siRNA knock-down in MCF7 Cells

12

Extensive regulation of the non‐coding transcriptome by hypoxia: role of HIF in releasing paused RNApol2Hani Choudhry, Johannes Schödel, Spyros Oikonomopoulos, Carme Camps, Steffen Grampp, Adrian L Harris, Peter J Ratcliffe, Jiannis Ragoussis, David R MoleDOI 10.1002/embr.201337642 | Published online 21.12.2013 EMBO reports (2014) 15, 70-76

TopHat / Cufflinks Pipeline

13

14

NGS Pipeline Demo

See Handouts

15

Galaxy Demo

See Handouts

NGS Pipeline

Developed by Yi Du in conjunction with CRI, Zhiyu Zhao.

Galaxy

Many thanks to John Chilton, Martin Chech, Nicola Soranzo, Andrew Robinson, Dannon Baker for

assistance incorporating BioHPC required changes into the Galaxy project.

Acknowledgements

16