Gene expression profiling using Ion semiconductor sequencing

4
Introduction Next-generation sequencing has transformed gene expression profiling and transcriptome studies. While other methods rely on a priori knowledge for design of hybridization probes or qPCR primers, RNA-Seq enables hypothesis-free analysis of any RNA molecule captured in a sequencing library. In addition to measuring counts of transcripts for differential expression levels [1], RNA-Seq also allows the detection of sequence-specific information, including transcript start and stop points [2], SNVs [3], and RNA editing events [4]. Other advantages include improved sensitivity, dynamic range, and ability to assess expression levels of fusion transcripts and variants. The Ion PGM sequencer democratizes RNA-Seq. With the Ion Total RNA-Seq Kit v2 [5], Ion semiconductor sequencing generates data that exceed microarray sensitivity, with additional quality control provided by the Ambion ® ERCC Spike-In Controls [6]. Ion semi- conductor sequencing also enables the detection of novel transcripts, gene fusions, and variation in allele-specific expression, all in a single experiment. With a portfolio of chips of varying outputs, the Ion PGM Sequencer scales to a variety of RNA-Seq applications for a broad range of transcriptome sizes. Along with straightforward analysis (using publicly available tools), Ion semiconductor sequencing provides the fastest sequencing workflow with sequencing run times of ~2 hours for 100-base reads. Comparison of gene expression profiling using Ion PGM Sequencer and microarrays It has been suggested that ~2 million mapped reads replicate results from hybridization arrays [7], but it has been demonstrated that increasing the number of reads will continue to increase the amount of information even after hundreds of millions of reads [8]. So, how many 100 base reads on an Ion PGM Sequencer are equivalent to a gene expres- sion microarray? This is the subject of the white paper [9] that compares Microarray Quality Control (MAQC) microarray data from Affymetrix GeneChip ® Human Genome U133 Plus 2.0 Arrays (HG-U133 Plus 2.0) with Ion PGM system–generated RNA-Seq data. Figure 1 shows the number of genes detected with Ion PGM Sequencer–derived data as a function of the detection level threshold as well as the genes detected by microarrays. At the most stringent detection threshold of ≥10 reads counts/gene, the number of genes detected with ~2 million Ion PGM Sequencer–derived mapped reads exceeds the genes detected by microarrays. When the read count threshold is relaxed to ≥5 read counts/gene, the number of genes detected by Ion RNA-Seq exceeds microarrays with just 1 million mapped reads. The number of significantly differentially expressed genes (sig DEGs) detected using ~2 million mean mapped Ion PGM Sequencer–derived reads was greater than those found in the microarray data. Using p ≤0.05 and fold change ≥2 between HBRR and UHRR samples as the criteria, the number of sig DEGs detected in the sequence data was 4,630 compared to 4,198 in the microarray data. • More sensitive than microarrays in detecting transcripts and changes in gene expression • Detect novel transcripts, fusion transcripts, and SNPs • Scalable technology offers multiple chips for a variety of RNA-Seq applications • Fastest sequencing workflow for RNA-Seq: ~2 hours for 100-base sequencing, with straightforward analysis using standard, publicly available tools APPLICATION NOTE RNA-Seq Gene expression profiling using Ion semiconductor sequencing For research use only. Not intended for any animal or human therapeutic or diagnostic use.

Transcript of Gene expression profiling using Ion semiconductor sequencing

Page 1: Gene expression profiling using Ion semiconductor sequencing

IntroductionNext-generation sequencing has transformed gene expression profiling and transcriptome studies. While other methods rely on a priori knowledge for design of hybridization probes or qPCR primers, RNA-Seq enables hypothesis-free analysis of any RNA molecule captured in a sequencing library. In addition to measuring counts of transcripts for differential expression levels [1], RNA-Seq also allows the detection of sequence-specific information, including transcript start and stop points [2], SNVs [3], and RNA editing events [4]. Other advantages include improved sensitivity, dynamic range, and ability to assess expression levels of fusion transcripts and variants.

The Ion PGM™ sequencer democratizes RNA-Seq. With the Ion Total RNA-Seq Kit v2 [5], Ion semiconductor sequencing generates data that exceed microarray sensitivity, with additional quality control provided by the Ambion® ERCC Spike-In Controls [6]. Ion semi-conductor sequencing also enables the detection of novel transcripts, gene fusions, and variation in allele-specific expression, all in a single experiment. With a portfolio of chips of varying outputs, the Ion PGM™ Sequencer scales to a variety of RNA-Seq applications for a broad range of transcriptome sizes. Along with straightforward analysis (using publicly available tools), Ion semiconductor sequencing provides the fastest sequencing workflow with sequencing run times of ~2 hours for 100-base reads.

Comparison of gene expression profiling using Ion PGM™ Sequencer and microarraysIt has been suggested that ~2 million mapped reads replicate results from hybridization arrays [7], but it has been demonstrated that increasing the number of reads will continue to increase the amount of information even after hundreds of millions of reads [8].

So, how many 100 base reads on an Ion PGM™ Sequencer are equivalent to a gene expres-sion microarray? This is the subject of the white paper [9] that compares Microarray Quality Control (MAQC) microarray data from Affymetrix GeneChip® Human Genome U133 Plus 2.0 Arrays (HG-U133 Plus 2.0) with Ion PGM™ system–generated RNA-Seq data. Figure 1 shows the number of genes detected with Ion PGM™ Sequencer–derived data as a function of the detection level threshold as well as the genes detected by microarrays. At the most stringent detection threshold of ≥10 reads counts/gene, the number of genes detected with ~2 million Ion PGM™ Sequencer–derived mapped reads exceeds the genes detected by microarrays. When the read count threshold is relaxed to ≥5 read counts/gene, the number of genes detected by Ion RNA-Seq exceeds microarrays with just 1 million mapped reads. The number of significantly differentially expressed genes (sig DEGs) detected using ~2 million mean mapped Ion PGM™ Sequencer–derived reads was greater than those found in the microarray data. Using p ≤0.05 and fold change ≥2 between HBRR and UHRR samples as the criteria, the number of sig DEGs detected in the sequence data was 4,630 compared to 4,198 in the microarray data.

• Moresensitivethanmicroarraysindetectingtranscriptsandchangesingeneexpression

• Detectnoveltranscripts,fusiontranscripts,andSNPs

• ScalabletechnologyoffersmultiplechipsforavarietyofRNA-Seqapplications

• FastestsequencingworkflowforRNA-Seq:~2hoursfor100-basesequencing,withstraightforwardanalysisusingstandard,publiclyavailabletools

APPLICATIONNOTERNA-Seq

GeneexpressionprofilingusingIonsemiconductorsequencing

For research use only. Not intended for any animal or human therapeutic or diagnostic use.

Page 2: Gene expression profiling using Ion semiconductor sequencing

Differential gene expression results from 2 million mapped Ion PGM™ Sequencer reads were compared with qPCR, and the Pearson correlation (R) of the log2 fold change of sig DEGs for qPCR and RNA-Seq exceeded 0.95.

An additional advantage to conducting RNA-Seq experiments on the Ion PGM™ platform is the ability to use the ERCC RNA Spike-In Mix to evaluate the sensitivity of transcript detection. These transcripts are polyadenylated, unlabeled RNAs which have been certified and tested by the National Institute of Standards and Technology (NIST) as a means to evaluate RNA measurement systems for performance and to control sources of variability. These transcripts have been balanced for GC content to closely represent characteristics of endogenous eukaryotic mRNAs, and their lengths range from 250 to 2,000 nucleotides. The ERCC pool of transcripts is configured in known titrations designed to represent a large dynamic range of expression levels. The ERCC ExFold RNA Spike-In Mix is used to evaluate the sensitivity of differential gene expression. These control RNAs allow users the ability to directly assess library quality, sensitivity, and dynamic range as well as

Figure 1. Gene detection in HBRR or UHRR samples as a function of cumulative mapped reads. Solid blue lines indicate gene level detection as counts from mapped reads are accumulated. Detection level thresholds for RNA-Seq data are shown at 1, 2, 5, and 10 read counts per gene. The orange dashed line indicates the 9,140 genes detected on MAQC HG-U133 Plus 2.0 microarrays. At the most stringent detection threshold of ≥10 reads counts/gene, the number of genes detected with 2 million Ion PGM™ Sequencer–derived mapped reads exceeds the genes detected by microarrays. When the read count threshold is relaxed to ≥5 read counts/gene, the number of genes detected by Ion RNA-Seq exceeds microarrays at 1 million mapped reads. For a more detailed discussion, see the white paper [9].

1 2 3 4 5

7000

8000

9000

10000

11000

12000

microarrays

Millions of mapped reads

Gen

es d

etec

ted

Detectionlevel

thresholdinreadcounts/

gene

Figure 3. Mapping reads to the genome enables detection of alternative splicing and novel exons. Ion PGM™ Sequencer–derived RNA-Seq results from analysis of a Ewing’s sarcoma cell line (data courtesy of T. Triche, Children’s Hospital Los Angeles). In this particular example, two novel exons (shown as red boxes) and alternate splicing events are revealed by the exon boundaries observed in the sequencing data. The loci have been anonymized as a publication is in preparation.

Read

s

Genomic coordinates

Figure 2. ERCC dose response. Sample ERCC dose-response scatter plot and linear regression statistics. Poly(A)–enriched RNA from HeLa cells spiked with Ambion® ERCC RNA Spike-In Mix was used to construct an RNA-Seq library and sequenced on an Ion 318™ Chip. Raw read counts (y-values) refer to total ERCC aligned reads; the x-values denote the relative concentration of each ERCC transcript in the pool. The grey-shaded area indicates the 90% confidence interval. R2 was 0.9475 with the sample size of 64.

log2 Relative ERCC Concentration

Log 2 E

RC

C c

ount

s (p

rese

nt c

alls

: >=

1 co

unt)

0

5

10 R2 = 0.9475N = 64

5 10 15 20

Page 3: Gene expression profiling using Ion semiconductor sequencing

expression fold changes in their experi-ments. The dose-response scatter plot shows a linear relationship between log2 of relative ERCC concentration and log2 of mapped reads (Figure 2). Direct comparison of results generated by collaborating laboratories is facilitated by using these external RNA controls.

Detection of novel exons, splice variants, fusion transcripts, and SNPs using Ion PGM™ SequencerThe discovery of unannotated exons and novel splice variants is one of the key advantages of RNA-Seq. Gene structure can be assessed by analyzing the distribu-tion of RNA-Seq reads across the genome. This is in direct contrast to a microarray, where oligonucleotide probes for transcript detection must be present on the arrays. In Figure 3, Ion PGM™ Sequencer–derived RNA-Seq reads are mapped to the genome, revealing two novel exons and alternate splicing events in this Ewing’s sarcoma cell line sample.

Fusion transcripts have been implicated as an important causative agent in differ-ent cancer types. Mapping the long and accurate RNA-Seq reads from the Ion PGM™ Sequencer to the whole genome rather than to a reference assembled from RefSeq transcripts enables direct detection of fusion transcripts by mapping reads across exon boundaries without prior knowledge of the boundaries. Novel events, like the fusion transcript between chromosome 22 and 11 in samples from a Ewing’s sarcoma cell line, shown in Figure 4, cannot be detected by conventional microarrays.

SNP discovery in transcripts and allele-specific expression can allow direct assessment of the change in RNA or protein structure. An example of SNP calling in data generated by the Ion PGM™ system is shown in Figure 5. In a case like this where a heterozygous SNP is present, you can directly measure the levels of the mutant transcript and correlate to the disease state.

It should be noted that SNP calling is more complex in RNA than DNA from the same individual because the expression level of each transcript can vary, affecting the total coverage needed to get the specific

coverage for that particular transcript. Thus, low expressors will require many more total reads than will high expressors. Other factors that may be present are allele-specific expression or RNA editing. In this context, having both genomic sequence and RNA sequence is an advantage, as the

transcriptome data can be overlaid on the genomic data.

Additionally, when a SNP is present in the genome but only a single or imbalanced allele is seen in the transcriptome, this suggests allele-specific expression.

Figure 4. Detection of fusion transcripts using RNA-Seq. Results generated with an Ion PGM™ Sequencer 100-base run and Ion 316™ Chip are mapped to the genome, showing a fusion transcript spanning chromosome 22 (left) and chromosome 11 (right) in the Ewing’s sarcoma cell line but not observed in fibroblast cells. (Data courtesy of T. Triche, Children’s Hospital Los Angeles).

Coverage

Coverage

Controlreads

Ewing’ssarcomareads

Figure 5. SNP detection with RNA-Seq. Ion PGM™ Sequencer–derived RNA-Seq results from analysis of a Ewing’s sarcoma cell line (Data courtesy of T. Triche, Children’s Hospital Los Angeles), show-ing an expressed SNP (top 3 traces are from cancer sample, next three samples show control samples).

Ewing’ssarcomasample1

Ewing’ssarcomasample2

Ewing’ssarcomasample3

Normalsample1

Normalsample2

Normalsample3

Page 4: Gene expression profiling using Ion semiconductor sequencing

lifetechnologies.com

For Research Use Only. Not intended for any animal or human therapeutic or diagnostic use.© 2012 Life Technologies Corporation. All rights reserved. The trademarks mentioned herein are the property of Life Technologies Corporation or their respective owners.Avadis is a registered trademark of Strand Life Sciences Private Limited. Partek is a registered trademark of Partek Incorporated. The content provided herein may relate to products that have not been officially released and is subject to change without notice.Printed in the USA. CO25396 0512

References1. Clonnan N. et al. 2008, Nat Methods, Stem cell transcriptome profiling via massive-scale mRNA sequencing, doi:10.1038/NMETH.12232. Hashimoto S. et al. 2009, PLoS One, High-Resolution Analysis of the 59-End Transcriptome Using a Next Generation DNA Sequencer,

doi:10.1371/journal.pone.00041083. Tuch B. et al. 2010, PLoS One, Tumor Transcriptome Sequencing Reveals Allelic Expression Imbalances Associated with Copy Number

Alterations. doi:10.1371/journal.pone.00093174. Picardi E. et al. 2010, Nucleic Acids Res, Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing.

doi:10.1093/nar/gkq2025. http://products.invitrogen.com/ivgn/product/44759366. http://products.invitrogen.com/ivgn/product/44567407. Ramsköld D. et al. 2011, PLoS Comput Bio, An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence

Data. doi:10.1371/journal.pcbi.10005988. Toung J.M. et al. 2011, Genome Res, RNA-sequence analysis of human B-cells, doi:10.1101/gr.116335.1109. “Sensitivity of RNA-Seq using Ion semiconductor sequencing” ioncommunity.iontorrent.com10. “C18-199” data set, ioncommunity.iontorrent.com

Scalable technologyA unique advantage of RNA-Seq is that the sensitivity of the experiment can easily be adjusted by the number of reads generated. The portfolio of chips for the Ion PGM™ Sequencer together with the new RNA barcodes (Ion Xpress™ RNA-Seq Barcode 01-16 Kit) let you tailor the number of reads

for your experiment. The Ion 316™ Chip with 6 million wells is well suited for small transcriptomes, and the Ion 318™ Chip with greater than >11 million wells generates more than 2 million reads mapping to RefSeq exons [10] for greater sensitivity than HG-U133 Plus 2.0 arrays, as described here.

As Ion semiconductor sequencing technol-ogy continues to advance rapidly, the ability and methods to analyze transcriptomes will continue to improve. One example is the increase in the number of reads mapped to RefSeq when using the new “mRNA from total RNA” protocol for the Dynabeads® mRNA DIRECT™ Micro Kit that has been optimized to further reduce rRNA content. Figure 6 shows a dramatic, reduction in reads mapped to rRNA and the corresponding increase in reads mapped to RefSeq exons when using this short 30-minute protocol to select poly(A) RNA from total RNA.

Fastest sequencing workflow and straight-forward RNA-Seq analysisWith approximately 2 hours for 100-base sequencing, the Ion PGM™ sequencer offers the fastest sequencing workflow for RNA-Seq experiments. Analysis is a two-step process. Mapping to the genomic

reference can be carried out by the Torrent Software Suite with data exported as a BAM file (containing all mapped data). The BAM file is then analyzed in a NGS–specific gene expression package. A large number of such packages exist, including free software from academic departments (e.g., IsoEM from the University of Connecticut) as well as commercial products such as Avadis® from Strand Genomics and Partek® Genomics Suite software, that microarray users may already be using.

ConclusionThe Ion PGM™ Sequencer, in conjunction with Ion chips and the Ion Total RNA-Seq Kit v2, is a fast, simple, and scalable solution for RNA-Seq experiments. The Ion workflow is fully supported by Ambion® RNA prepara-tion kits offering the fastest solution from sample to data, at any read length, whether 100 or 200 bases. RNA-Seq on the Ion PGM™ Sequencer produces results comparable to gene expression arrays, with the additional advantages of being additive for further sen-sitivity and generating hypothesis-neutral sequencing data that allow for discovery of fusion transcripts, novel exons and alterna-tive splicing, and detection of SNPs and allele-specific expression patterns.

Figure 6. Impact of mRNA isolation method on RefSeq mapped read distribution. The Poly(A) Purist™ Kit or Dynabeads® mRNA DIRECT™ Micro Kit with the mRNA from Total RNA protocol was used to purify mRNA from HeLa total RNA. The resulting mRNA was used to prepare libraries for sequencing using the Ion Total RNA-Seq Kit v2 and sequenced on an Ion 318™ Chip. The graph shows the dramatic reduction of reads matching different portions of RefSeq.

86%  

88%  

90%  

92%  

94%  

96%  

98%  

100%  

Poly(A)  Purist™  Kit     Dynabeads®  mRNA  DIRECT™  Micro  Kit    

rRNA  

RefSeq  exons    (including  known  and  putaPve  juncitons)  

86%  

88%  

90%  

92%  

94%  

96%  

98%  

100%  

Poly(A)  Purist™  Kit     Dynabeads®  mRNA  DIRECT™  Micro  Kit    

rRNA  

RefSeq  exons    (including  known  and  putaPve  juncitons)  

RefSeqexons(includingknownandputativejunctions)