BS312 Bioinformatics Antonio Marco - WordPress.com · Recommended readings Gene expression...
Transcript of BS312 Bioinformatics Antonio Marco - WordPress.com · Recommended readings Gene expression...
Gene expression analysisBS312 Bioinformatics
Antonio Marco
School of Biological SciencesUniversity of Essex
10-Nov-15
Outline
1 Gene Expression
2 Measuring RNA expression levels
3 Data processing
4 Visualizing Gene Expression Patterns
5 Applications
6 Practical overview
The ’central dogma’ of molecular biology
DNA makes RNA makes Protein
Francis Crick (1956)
”I just didn’t know what dogma meant”
The ’central dogma’ of molecular biology
DNA makes RNA makes Protein
Francis Crick (1956)
”I just didn’t know what dogma meant”
A more realistic scenario of gene expression
Analyzing Gene Expression: overview
Analyzing Gene Expression: overview
Outline
1 Gene Expression
2 Measuring RNA expression levels
3 Data processing
4 Visualizing Gene Expression Patterns
5 Applications
6 Practical overview
Northern blot
The ’IS IT THERE?’ approach
Microarrays
The ’IS ANY OF THESE?’ approach
RNA-Seq
The ’WHAT’S IN THERE?’ approach
RNA-Seq
DNA sequencing is done in small fragments
RNA-Seq
RNA-Seq is a quantitative technique
Outline
1 Gene Expression
2 Measuring RNA expression levels
3 Data processing
4 Visualizing Gene Expression Patterns
5 Applications
6 Practical overview
Assessing data quality
• Ideally, sequencers always give the actual reads
• In reality, they often contain errors
• Good news is, sequencers tell us how confident they are
Assessing data quality
• Ideally, sequencers always give the actual reads
• In reality, they often contain errors
• Good news is, sequencers tell us how confident they are
Assessing data quality
• Ideally, sequencers always give the actual reads
• In reality, they often contain errors
• Good news is, sequencers tell us how confident they are
Assessing data quality
• Ideally, sequencers always give the actual reads
• In reality, they often contain errors
• Good news is, sequencers tell us how confident they are
Phred Score and FASTQ format
• Phred score measures the probability of a sequencing error
• The FASTQ format includes Phred scores in a one-letter code
@SEQUENCE NAME
CATGGCTAGCTGCTAGCTAGCTAGACATTCATCGAAATCGCTAGCCTAGCTACGA
+
!’’*((((***+))%%%++)(%%%%).1***-+*’’))**55CCF>>>>>>C%%%
Phred Score and FASTQ format
• Phred score measures the probability of a sequencing error
• The FASTQ format includes Phred scores in a one-letter code
@SEQUENCE NAME
CATGGCTAGCTGCTAGCTAGCTAGACATTCATCGAAATCGCTAGCCTAGCTACGA
+
!’’*((((***+))%%%++)(%%%%).1***-+*’’))**55CCF>>>>>>C%%%
Read quality
Normalization: principles
• Bilbo believes his sword is big, five ’hands’ in length
• Gandalf thinks Bilbo’s sword is rather short
• Who’s right?
Normalization: principles
• Bilbo believes his sword is big, five ’hands’ in length
• Gandalf thinks Bilbo’s sword is rather short
• Who’s right?
Normalization: RPKM
Normalization: Smear plots
• ’Smear plots’: average # of reads (X) and fold difference (Y)
• Average difference (red line) should be about 0
• Normalization does this!
Normalization: Smear plots
• ’Smear plots’: average # of reads (X) and fold difference (Y)
• Average difference (red line) should be about 0• Normalization does this!
Outline
1 Gene Expression
2 Measuring RNA expression levels
3 Data processing
4 Visualizing Gene Expression Patterns
5 Applications
6 Practical overview
Hierarchical clustering
• In previous lectures youlearnt that phylogenetictrees reflect sequencesimilarity
• Likewise, trees can be builtto reflect expressionsimilarity
• Most frequent algorithm isUPGMA
Hierarchical clustering
• In previous lectures youlearnt that phylogenetictrees reflect sequencesimilarity
• Likewise, trees can be builtto reflect expressionsimilarity
• Most frequent algorithm isUPGMA
HC and Heatmaps
Outline
1 Gene Expression
2 Measuring RNA expression levels
3 Data processing
4 Visualizing Gene Expression Patterns
5 Applications
6 Practical overview
Time-course series
Drosophila (fruit fly) development
Adapted from: Graveley et al. (2011) Nature 471:473
Cancer
MicroRNAs in different cancer cells
Volinia et al. (2012) PNAS 109:3024
Reconstruct transcripts
Outline
1 Gene Expression
2 Measuring RNA expression levels
3 Data processing
4 Visualizing Gene Expression Patterns
5 Applications
6 Practical overview
Practical overview
Characterize the transcription profile of the Down’s SyndromeCritical Region
• Check the quality of the reads
• Mapping reads to a reference genome (human)
• Assemble reads into transcripts
• Visualize annotated transcripts
Recommended readings
Gene expression analysis:
• Pevsner J (2009) Bioinformatics and Functional Genomics.John Wiley & Sons. Chapter 9
• Mutz K-O et al. (2013) Transcriptome analysis usingnext-generation sequencing. Curr Op Biotech 24:22-30
Web resources:
• RNA-Seq at http://en.wikipedia.org/wiki/RNA-Seq