NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.
-
Upload
tracey-kennedy -
Category
Documents
-
view
223 -
download
0
Transcript of NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.
![Page 1: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/1.jpg)
NESCENT : NGS : Measuring expression
Jen Taylor
Bioinformatics Team
CSIRO Plant Industry
![Page 2: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/2.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Measuring Expression• What & Why
• What is expression and why do we care?
• How• Platforms / Technology
• Closed approaches – Microarray• Open approaches - Sequencing
• Experimental Design
• Analysis• Biases• Bioinformatics• Statistical Issues and Analysis
• In action• Workshop – Detection of Differential Expression• Case Studies in Plant functional genomics
![Page 3: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/3.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
What is expression / transcriptome ?
mRNA
rRNAtRNA
siRNAmicroRNA
piRNA
tasiRNA lncRNA
DNA
![Page 4: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/4.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster)
Gonville & Caius College, Cambridge, UK.
Beyond the Genome:
1995
Human Genome sequencing begins in earnest
“Mapping the Book of Life”
2000 - First Draft
2003 - Essential Completion
= approx 140, 000 genes
= 30, 000 – 40,000 genes ??
= 24, 195 genes !!!???
![Page 5: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/5.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
“The failure of the human genome”
“despite more than 700 genome-scanning publications and nearly $100bn spent, geneticists still had not found more than a fractional genetic basis for human disease “
Manolio et al., Nature, 2009
“The most likely explanation for why genes for common diseases have not been found is that, with few exceptions, they do not exist.
…., if inherited genes are not to blame for our commonest illnesses, can we find out what is? “
Guardian, 2011
![Page 6: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/6.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster)
Gonville & Caius College, Cambridge, UK.
Beyond the Genome:
Gene Number ≠ Complexity
Co
mp
lexityRegulation
Gene
Transcriptome
![Page 7: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/7.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Why the expression ?
High-throughput friendly
Context dependent
Regulatory
network
Predicts Biology
Transcriptome
Genome
Proteome
**Li et al., 2004
**
![Page 8: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/8.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Measuring Expression ?
Parts Description• Function?
• Interconnectedness?
Comparisons• Population - level• Between genomes
![Page 9: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/9.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Measuring Expression ?
What are important members of a transcriptome?
mRNA• polyadenylated, coding• alternatively spliced
Noncoding RNA (small RNA)• varying lengths, functions (18 – 32 bases)• microRNA, siRNA, piRNA, tasiRNA, long non-coding RNA
“Dark” RNA• transcription outside of annotated genes • Non-polyadenylated
Anti-sense transcription
![Page 10: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/10.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Measuring Expression ?
How does the transcriptome vary to give rise to phenotype ?
Changes in Abundance• Abundance = Rate of Transcription – Rate of Decay
Changes in Function• Availability for function – polyadenylation, silencing, localisation• Suitability for function – alternate splicing
![Page 11: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/11.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
How to measure Expression
PLATFORMS / TECHNOLOGY
![Page 12: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/12.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Measuring Expression : platforms
• Closed systems – microarray• Probes immobilised on a substrate profile target species in the
transcriptome
![Page 13: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/13.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
![Page 14: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/14.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Single and two colour arrays
Labelling
Two colour
Control
Experimental
Probe Library
Array
Labelling
Single colour
Sample A
Array Manufacture
Hybridisation
Scanning
![Page 15: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/15.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Array profiling
Affymetrix Array Targets
• Arabidopsis Genome 24,000
• C. elegans Genome 22,500
• Drosophila Genome 18, 500
• E. coli Genome 20, 366
• Human Genome U133 Plus 47,000
• Mouse Genome 39, 000
• Yeast Genome
• S.cerevisiae 5, 841
• S. pombe 5, 031
• Rat Genome 30, 000
• Zebrafish 14, 900
• Plasmodium / Anopheles
• P. faciparum 4,300
• A. gambiae 14,900
• Barley (25,500), Soybean (37,500 + 23,300 pathogen), Grape (15,700)
• Canine (21,700), Bovine (23,000)
• B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400)
![Page 16: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/16.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
![Page 17: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/17.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
![Page 18: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/18.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Closed System – Microarray
• Pros• High-throughput
• Targeted profiling
• Inexpensive – “population friendly”
• Analytical methods are standardised
• Negative• “Closed system” , novel = invisible
• Difficult to see allelle-specific expression
• Biases due to hybridisation• SNPs• Competitive and non-specific hybridisation
![Page 19: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/19.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Open systems – RNA Sequencing
Technology:• Illumina• SOLiD, IonTorrent• 454
Pros:• Transcript discovery• Allelic expression• High resolution abundance measures
Cons:• Analysis can be complex• Expensive• Sensitivity is sequencing depth dependent
![Page 20: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/20.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNA Sequencing
Mortazavi et al., 2008
![Page 21: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/21.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq - Correspondence
• Range > 5 orders of magnitude
• Better detection of low abundance transcripts
Marioni et al., 2009
![Page 22: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/22.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Platform Choice / Sample Preparation Choice
What do you want to profile ?
• Polyadenylated• PolyA RNA extraction
• Small RNA (< 100 bases)• Size filtering by gel
• Strand-specific
• RNA – Protein Interactions• RNA Immunoprecipitation (IP)
![Page 23: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/23.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq - Workflow
Library Construction
Sample
Total RNA
PolyA RNA
Small RNA
Sequencing
Base calling & QC
Mapping to Genome
Assembly to Contigs
Differential Expression
SNP detection
Transcript structure
Secondary structure
Targets or Products
![Page 24: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/24.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Illumina RNASeq : TruSeq
![Page 25: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/25.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Small RNA sequencing
Small RNA
25
75
110
smallRNA separation: PAGE
small RNA < 35bp
134
![Page 26: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/26.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Strand - specificity
Using adaptors Using chemical modification
SMART : addition of C’s on 5’ end
Ligation : 3’ and 5’ adaptors added sequentially
Levin et al., 2010
dUTP : Addition and removal after selection
![Page 27: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/27.jpg)
CSIRO. Nescent August 2011 - Measuring Expression Levin et al., 2010
![Page 28: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/28.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Non-polyA methods
• Total RNA extraction
• Ribosomal RNA and tRNA > 95-97% of total RNA
• Ribosomal reduction methods• Subtractive hybridisation with rRNA probes
• Exonuclease cleave of rRNA
• NuGen – “proprietary combination of reverse transcriptase and primers in the Ovation RNA-Seq System”
• cDNA normalisation methods• Partial digestion of any highly abundant species (Evrogen)
![Page 29: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/29.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Platform Choice / Sample Preparation Choice
What do you want to profile ?
• Polyadenylated• PolyA RNA extraction
• Small RNA (< 100 bases)• Size filtering by gel
• Strand-specific
• RNA – Protein Interactions• RNA Immunoprecipitation (IP)
• Non - PolyA• rRNA reduction
![Page 30: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/30.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
EXPERIMENTAL DESIGN and ANALYSIS
![Page 31: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/31.jpg)
• Issues:• sequencing depth - how much ?
• number of replicates – how many ?
• Aims of the data : • Transcriptome assembly / transcript characterisation
• Maximise depth
• Detection of differential expression (denovo or reference)
• Balance depth and replication
CSIRO. Sequencing Depth V.S. Number of Replicates
RNASeq Experimental Design
![Page 32: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/32.jpg)
CSIRO. Sequencing Depth V.S. Number of Replicates
Defining Replicates
• Technical Replicates • Biological Replicates
Library 1
Lane 1
Individual
Library 2
Lane 2 Lane 3 Lane 4 Lane 1
,Individual 1
Lane 2
Individual 2
Library 1 Library 2
Depth = 2 x 100% lane / sample 100% lane / sample
Lane 1
Library 4
Multiplex
Library 3
Library 2
Library 1
L1
L2
L3
L4
25% lane / sample
![Page 33: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/33.jpg)
CSIRO. Sequencing Depth V.S. Number of Replicates
![Page 34: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/34.jpg)
CSIRO. Sequencing Depth V.S. Number of Replicates
Coverage Depth
![Page 35: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/35.jpg)
CSIRO. Sequencing Depth V.S. Number of Replicates
Number of Replicates
edgeR <= 0.01 , DESeq <= 0.01
More information in biological replicates than depth
For differential expression
# Rep
s
2 4 6 8 10 12
False P
0.03 0.03 0.03 0.03 0.03 0.03
False N
0.84 0.72 0.64 0.59 0.54 0.50
True P
0.16 0.28 0.36 0.41 0.46 0.50
True N
0.97 0.97 0.97 0.97 0.97 0.97
![Page 36: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/36.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq Analysis
• Overall Aim :• To get an accurate measurement of transcript abundance, structure
and identity
• Biases and Compositions
• Alignment• TopHat / Cufflinks
• Assembly• ABySS
![Page 37: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/37.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Assumptions
Every transcript / k-mer has equal chance of being sequenced
No. sequences observed ≈ transcript abundance
Gene A = z Reads / million Gene B = y Reads / million
z = 2 x y
Gene A > Gene B
![Page 38: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/38.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Length Bias
Oshlack and Wakefield, 2009
![Page 39: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/39.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Alignment Bias
![Page 40: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/40.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Alignment Bias
![Page 41: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/41.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Sequencing Bias
Hansen et al., 2010
![Page 42: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/42.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Bias
Every transcript / k-mer has equal chance of being sequenced
No. sequences observed ≈ transcript abundance
Gene A = z Reads / million / kb Gene B = y Reads / million / kb
Weighting schemas (e.g. Cufflinks) :
• Mapability
• kmer / fragment frequencies
![Page 43: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/43.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Bias
Every transcript / k-mer has equal chance of being sequenced
No. sequences observed ≈ transcript abundance
Gene A1 = z Reads per million Gene A2 = y Reads per million
z = 2 x y
Sample A vs Sample B
![Page 44: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/44.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Read density variability
![Page 45: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/45.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq – Compositional properties
Depth of Sequence• Sequence count ≈ Transcript Abundance
• Majority of the data can be dominated by a small number of highly abundant transcripts
• Ability to observe transcripts of smaller abundance is dependent upon sequence depth
• Fixed budget of reads
![Page 46: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/46.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
A simple example – compositional bias
AA
BB
sample II
Sequencing budget / depth: 4000 reads
AA
DDCCBB
sample IExpected counts
1000
1000
1000
1000
2000
Expected counts
2000
![Page 47: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/47.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Soil diversity by phylogenetic analysis - Phylum level
C
B
A
Recognized bacterial phyla
0% 20% 40% 60% 80% 100%
% distribution
454-sequence analysis of bacterial 16S rRNA gene~410,000 sequences
A. Richardson, CSIRO
![Page 48: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/48.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq Bioinformatics Analysis
• Aims:• To get an accurate measurement of transcript abundance,
structure and identity
• Biases and Compositions• Relative abundances NOT absolute
• Alignment• TopHat
• Assembly• ABySS
![Page 49: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/49.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNA Sequencing analysis
Sequence Data
Alignment
Read Density
Differential Expression
SNPs
Transcript Characterisation
Assembly
Contigs
Genome?
![Page 50: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/50.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq – Alignment Considerations
Reads with multiple locations
• Discard / Random Allocation
• Clustering - local coverage
• Weighting
Reads Spanning Exons
• Make and align to exon junction libraries
• Denovo junction detection
Summarisation of counts
• Exons
• Transcript boundaries
• Inferred read boundaries
![Page 51: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/51.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
TopHat
Trapnell et al., 2009; Roberts et al., 2011
Multimapping : ≤10 sites
Assembly : consensus ‘island’ exon
![Page 52: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/52.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
TopHat / Cufflinks
Trapnell et al., 2009; Roberts et al., 2011
Heuristics :
• “Correct” errors in low coverage areas
• Grabs 45 bp either side of islands to capture splice sites
• Collapse small islands
• Looks for junctions within larger islands, highly covered
Cufflinks :
• calculates the probability of observing a certain fragment within a given transcript given surrounding fragments.
![Page 53: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/53.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Alignment
• Great if you have a fully annotated, reference
• Okay.. If you have a partially annotated reference
• “Different” if you have a big bunch of ESTs
Options:• Align to a neighbouring genome or EST library• Denovo transcriptome assembly
Tools:• ABySS, Mira, Trinity, HT-Seq, SAMtools
![Page 54: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/54.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNA Sequencing analysis
Sequence Data
Alignment
Read Density
Differential Expression
SNPs
Transcript Characterisation
Assembly
Contigs
Genome?
![Page 55: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/55.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Denovo transcriptome assembly
• ABySS• MIRA• Trinity• Velvet• AllPaths• Soap-denovo• Euler• CABOG• Edena• SHARCGS• VCAKE• SSAKE• CAP3
• Will run on reasonable computer resources for large genomes
• (e.g. < 1 TB of RAM)
• Paired end data handling
• Platform flexible
• Handles haplotype complexity and polyploid genomes
![Page 56: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/56.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Denovo transcriptome assembly
• ABySS• MIRA• Trinity• Velvet• AllPaths• Soap-denovo• Euler• CABOG• Edena• SHARCGS• VCAKE• SSAKE• CAP3
• Will run on reasonable computer resources for large genomes
• (e.g. < 1 TB of RAM)
• Handles paired end data
• Handles data from all platforms
• Handles haplotype complexity and polyploid genomes
![Page 57: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/57.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Assembly – Kmer graphs
K = 4
Miller et al., 2010
![Page 58: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/58.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Assembly – Kmer graphs
Spurs
• Sequencing error
Bubbles
• Sequencing error
• Polymorphism
Frayed Rope / Cycles
• Repeats
Miller et al., 2010
![Page 59: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/59.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Assembly – Kmer graphs
Spurs
• Sequencing error
Bubbles
• Sequencing error
• Polymorphism
Frayed Rope / Cycles
• Repeats
Miller et al., 2010
![Page 60: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/60.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
ABySS & TransABySS
• User specifies k
• Optimal k depends on sequencing depth
![Page 61: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/61.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
ABySS & TransABySS
• Sequencing depth is relative to transcript abundance• Iterate over multiple k and merge
• Contigs contained within a large contig are “buried”
![Page 62: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/62.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Assessing assembly quality ?
• Comparisons between assembly algorithms• Contig summary statistics• Comparisons to known resources (e.g. ESTs)
Trial on Rice Transcriptome:• 120 Million 75 bp single end Illumina reads – embryo
• ABySS :• Number of contigs = 6, 804• Contig length range = 38 – 2,818 [mean = 203]
• Database comparisons :
• Rice public cDNA sequences : 67, 393
• Contigs with high quality matches to cDNA : 6,555 (96%)
![Page 63: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/63.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq Bioinformatics Analysis
• Aims:• To get an accurate measurement of transcript abundance,
structure and identity
• Biases and Compositions• Relative abundances NOT absolute
• Alignment
• Assembly
![Page 64: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/64.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
STATISTICAL ISSUES
![Page 65: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/65.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Measuring Expression – Statistical Issues
• Data elements
• Normalisation
• Detection of Differential Expression
![Page 66: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/66.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Count Data : of what ?
![Page 67: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/67.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Count Data : of what ?
Garber et al., 2011
![Page 68: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/68.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Statistical analysis of RNASeq
• Count data• Distribution is positively skewed, not normal• Between sample variability in counts - normalisation
![Page 69: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/69.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Normalization is required
Two scenarios :
1. Different sizes of total reads (library size)
2. Fixed library size, subset of highly expressed reads in 1 sample.
Both reduce sequencing budget available for the majority of transcripts
![Page 70: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/70.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Normalisation
• Assume the majority of log ratios = 0 [No change]
Robinson and Oshlack, 2010
TMM : Trimmed Mean of M values (log ratios)
Adjust TMM to be equal between samples
![Page 71: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/71.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
DE genes with and without TMM normalization
![Page 72: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/72.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq data – Poisson Distributions
• Poisson distributions are used when things are counted
• The probability of seeing n events in a fixed time or space
• The number of lions on a 1 day safari
• The number of raindrops on a tennis court
• The number of flying elephants in a year
• Requires λ : rate of events• Variance = mean = λ
![Page 73: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/73.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq data – Negative Binomial
• RNASeq data is more variable than Poisson• Variance > mean = λ
• Less prominent for large mean
• Over-dispersed Poisson
Noise types• Shot noise
• Unavoidable, prominent for low mean
• Technical noise• Small, hopefully, can be managed
• Biological noise• Sample differences
![Page 74: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/74.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNA Seq
• Variance also depends on the mean
Anders, 2010
![Page 75: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/75.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq Model
The total counts for a transcript in sample j from condition c :
cjcj vss 2
Library normalisation
Mean Value Fitted Variance (overdispersion)
For a given gene , test for a difference in counts between conditions.
Is mean c1 + mean c2 statistically different to mean c1 + mean c1?
![Page 76: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/76.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
RNASeq DE Testing
• DESeq – Anders and Huber, 2010• EdgeR – Robinson et al., 2009 – R• BaySeq – Hardcastle and Kelley, 2010 – R• DEGSeq – Wang et al., 2010 – R• NBP - Di et al., 2011
• LOX – Zhang et al., 2010• Infers expression measures allowing for incorporation of noise from
different methodologies in the one experimental design
![Page 77: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/77.jpg)
CSIRO. Nescent August 2011 - Measuring Expression
Measuring Expression• What & Why
• What is expression and why do we care?
• How• Platforms / Technology
• Closed approaches – Microarray• Open approaches - Sequencing
• Experimental Design
• Analysis• Biases• Bioinformatics• Statistical Issues and Analysis
• In action• Workshop – Detection of Differential Expression• Case Studies in Plant functional genomics
![Page 78: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry.](https://reader035.fdocuments.in/reader035/viewer/2022062216/56649d985503460f94a82619/html5/thumbnails/78.jpg)
Contact UsPhone: 1300 363 400 or +61 3 9545 2176
Email: [email protected] Web: www.csiro.au
Thank you
Plant IndustryJennifer M TaylorBionformatics Leader
Phone: +61 2 62464929Email: [email protected]
Acknowledgements
Jose RoblesStuart StephenHua YingAndrew Spriggs
Alexie Pa
NESCENT Funding