Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

25
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana

Transcript of Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Page 1: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in

Arabidopsis thaliana

Page 2: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Before we start: Align sequence reads to the reference genome

The most time-consuming part of the analysis is doing the alignments of the reads (in Sanger fastq format) for all replicates against the reference genome.

Page 3: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Overview: This training module is designed to provide a hands on experience in using RNA-Seq for transcriptome profiling.

Question: How well is the annotated transcriptome represented in RNA-seq data in Arabidopsis WT and hy5 genetic backgrounds?

How can we compare gene expression levels in the two samples?

RNA-seq in the Discovery Environment

Page 4: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Scientific Objective

LONG HYPOCOTYL 5 (HY5) is a basic leucine zipper transcription factor (TF).

Mutations in the HY5 gene cause aberrant phenotypes in Arabidopsis morphology, pigmentation and hormonal response.

We will use RNA-seq to compare the transcriptomes of seedlings from WT and hy5 genetic backgrounds to identify HY5-regulated genes.

Page 5: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Samples

• Experimental data downloaded from the NCBI Short Read Archive (GEO:GSM613465 and GEO:GSM613466)

• Two replicates each of RNA-seq runs for Wild-type and hy5 mutant seedlings.

Page 6: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Specific Objectives

By the end of this module, you should

1)Be more familiar with the DE user interface

2)Understand the starting data for RNA-seq analysis

3)Be able to align short sequence reads with a reference genome in the DE

4)Be able to analyze differential gene expression in the DE

5)Be able to use DE text manipulation tools to explore the gene expression data

Page 7: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Quick Summary

Find D

iffere

ntially

Expre

ssed genes

Align to

Genom

e: TopHat

View Alig

nments

: IGV

Differe

ntial E

xpressio

n: CuffD

iff

Download R

eads from

SRA

Export Reads to

FASTQ

Page 8: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Import SRA data from NCBI SRA

Extract FASTQ files from the

downloaded SRA archives

Pre-Configured: Getting the RNA-seq Data

Page 9: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

RNA-Seq Conceptual Overview

Image source: http://www.bgisequence.com

Page 10: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

RNA-Seq Workflow Overview

Page 11: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Align the four FASTQ files to Arabidopsis genome using TopHat

Step 1: Align Reads to the Genome

Page 12: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

TopHat

• TopHat is one of many applications for aligning short sequence reads to a reference genome.

• It uses the BOWTIE aligner internally.

• Other alternatives are BWA, MAQ, TopHat, Stampy, Novoalign, etc.

Page 13: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

RNA-seq Sample Read Statistics

• Genome alignments from TopHat were saved as BAM files, the binary version of SAM ().

• Reads retained by TopHat are shown below

Sequence run WT-1 WT-2 hy5-1 hy5-2

Reads 10,866,702 10,276,268 13,410,011 12,471,462

Seq. (Mbase) 445.5 421.3 549.8 511.3

Page 14: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Index BAM files using SAMtools

Prepare BAM files for viewing

Page 15: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Using IGV in Atmosphere

1. We already Launched an instance of NGS Viewers in Atmosphere

2. Use VNClient to connect to your remote desktop

Page 16: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Pre-configured VM for NGS Viewers

Page 17: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

Use IGV to inspect outputs from TopHat

http://www.broadinstitute.org/igv/

Integrated Genomics Viewer (IGV)

Page 18: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

ATG44120 (12S seed storage protein) significantly down-regulated in hy5 mutantBackground (> 9-fold p=0). Compare to gene on right lacking differential expression

Page 19: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Other Ways to View Alignment Data WIG->Ensembl

Page 20: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

RNA-Seq Workflow Overview

Page 21: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

CuffDiff

• CuffLinks is a program that assembles aligned RNA-Seq reads into transcripts, estimates their abundances, and tests for differential expression and regulation transcriptome-wide.

• CuffDiff is a program within CuffLinks that compares transcript abundance between samples

Page 22: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Examining Differential Gene Expression

Page 23: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Examining the Gene Expression Data

Page 24: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Filter CuffDiff results for up or down-regulated gene expression in hy5 seedlings

Differentially expressed genes

Page 25: Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Differentially expressed genes

Example filtered CuffDiff results generated with the Filter_CuffDiff_Results to1)Select genes with minimum two-fold expression difference2)Select genes with significant differential expression (q <= 0.05)3)Add gene descriptions