Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz.
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz.
Expression Data and Microarrays
CMMB
November 29, 2001
Todd Scheetz
Overview
Gene expression– mRNA– protein
Northern Blots
RT-PCR
SAGE
MicroArray
Gene Expression Review
Transcription– generation of mRNA from genomic DNA
a complete copy is made, including both introns and exons. pre-mRNA
AAAA...
genomic
pre-mRNA
Gene Expression ReviewProcessing / Splicing
– removal of the introns from the pre-mRNA
mature mRNA– also exported from the nucleus to the
cytoplasm– alternative splicing
AAAA...
pre-mRNA
AAAA...
AAAA...
mature mRNAs(splice variants)
Gene Expression ReviewTranslation
– takes an mRNA molecule and uses it to construct an amino acid sequence.
– the ribosome is the underlying machinery used in the process of translation.
Measuring Gene Expression
Two major differentiating factors…Quantitative vs. Qualitative
mRNA vs protein
Most techniques can be used to determine quantitative expression levels.
Ex. EST sequencing
Measuring Gene Expression
More sophisticated experiments…Comparing expression levels of multiple genes
Comparing co-regulation or differential regulation.
Ex. EST sequencing
Northern Blot
Measure relative expression levels of mRNA
1. mRNA isolation and purification
2. electrophorese on a gel
3. The gel is probed by hybridizing with a labeled clone for the gene under study.
Northern Blot
Northern Blot
RT-PCR
Measures relative expression of mRNA
1. Isolate and purify mRNA
2. reverse transcription
3. PCR amplification
4. run on gel and probe/hybridize
RT-PCR
RT-PCR
Why use RT?Can observe very low levels of expression
Requires very small amounts of mRNA
The bad…Potential expression-level skew due to non-
linearity of PCR
Have to design multiple custom primers for each gene.
SAGE
SAGE
SAGE
Tags are isolated and concatermized.
Relative expression levels can be compared between cells in different states.
SAGE
--gene to tag mappinghttp://www.ncbi.nlm.nih.gov/SAGE/SAGEcid.cgi?cid=28726
MicroArray
What are they?allow 1000’s of expression analyses to be
performed concurrently.
What technologies are used?
How to analyze the image?
How to analyze the expression data?
What bioinformatics challenges are there?
Potential Microarray Applications
• Drug discovery / toxicology studies
• Mutation/polymorphism detection Differing expression of genes over:– Time– Tissues– Disease States
• Sub-typing complex genetic diseases
DNA Array Technology
Array TypeSpot Density
(per cm 2 )Probe Target Labeling
Nylon Macroarrays < 100 cDNA RNA RadioactiveNylon Microarrays < 5000 cDNA mRNA Radioactive/FlourescentGlass Microarrays < 10,000 cDNA mRNA FlourescentOligonucleotide Chips <250,000 oligo's mRNA Flourescent
Physical Spotting
MicroArray
Glass Microarray
326 Rat Heart Genes, 2x spotting
Photolithographic
MicroArray
MicroArray
MicroArray
MicroArray
Overview of data capturetwo different mRNA populations, labeled with
different fluors
excited by a laser
each fluour excites at a different wavelength, which is captured using a photodetector attached to a filter tuned to the particular fluor
MicroArray
Overview of image analysisspot identification
grid alignment
skew
image normalizationvariable background
uneven hybridization
Microarray Data Pipeline
Image Analysis/Data Quantization
• Feature (target probe) segmentation
• Data extraction and quantization of:– Background– Feature
• Correlation of feature identity and location within image
• Display of pseudo-color image
Image Segmentation
+
Microarray Experiment Design
• Type I: (n = 2)– How is this gene expressed in target 1 as compared to
target 2?
– Which genes show up/down regulation between the two targets?
• Type II: (n > 2)– How does the expression of gene A vary over time,
tissues, or treatments?
– Do any of the expression profiles exhibit similar patterns of expression?
Motivation & Design Constraints
• Probe set design involves the prioritizing and parsing of an initial data set containing potentially hundreds of thousands of probe candidates to define a reasonably sized set for use in a microarray experiment
• A single hybridization can produce several thousand data tuples, each containing multiple (n>10) measurements
• No “All-in-one” software package is currently available, therefore, communication of data between the packages must be facilitated by the pipeline
Probe Set Design
• Goal of probe set design is to identify a reasonably sized subset of probes from a much larger starting set from a variety of sources
• By defining a set of criteria, an investigator should be able to create new probe sets or refine existing sets
• Pruning a data set should be done in several stages: Use readily available information to limit scope of data Obtain more information about remaining probes Narrow focus based on additional information Iterate until desired data set is obtained
Sample Probe Set Design Criteria
• 1° -- Direct– Species
– Tissue
– Chromosome
– Sequence Available• Quality
• Tail/Poly(A) signal
– Map position known?
– Cluster size
• 2° -- Indirect– Blast results
• Confidence value
• Homology (or lack of)
• Annotation contains words like “transfer”
• 3’ & 5’ EST reads hit same gene
– Syntenic Map Information
– Known phenotypes in other species
cDNA Microarray Slide Creation• cDNA clones defining a probe set must be re-arrayed from
their sources (e.g. local storage or commercial) into a format suitable for amplification and printing (e.g. 96-well microtiter plates)
• Based on the size of the probe set and the limitations of the printer, a parameter set (# of pens, spot spacing, grid dimensions,…) must be defined for printing the probe set onto the slide(s)
• A mapping operation must be performed in order to track each probe from source to destination in order to correlate known information with a particular “spot” in a microarray image
MicroArray
Overview of data analysisvs. time
vs. other genesco-reg.
diff. reg
pathway ident.
Data Analysis• Data analysis consists of several post-quantization
steps:– Statistics/Metrics Calculations– Scaling/Normalization of the Data– Differential Expression– Coordinated Gene Expression (aka clustering)
• Most software packages perform only a limited number of analysis tasks
• Databases can facilitate the movement of data between packages
Scaling and/or Normalization
• Positive Controls– ‘Spiked’ DNA– Housekeeping Genes– Total Array
• Negative Controls– Foreign DNA– ‘Empty’ spots
• Linear regression
• Log-linear regression
• Ratio statistics
• Log(ratio) mean/median centering
• Nonlinear regression
Scaling and/or Normalization
MicroArray
Bioinformatics challenges
1. data management
2. utilizing data from multiple experiments
(type II)
3. utilizing data from multiple groups
* with different technologies
* with only processed data available
+ - ++ - - ++ - - - + + -- + -
Database(s)
1 2 3 4
Timepoints
Exp
ress
ion
Lev
elCondition1 2 3 4
Gen
e A
B
C E
D
0 60 120 180
Time
Exp
ress
ion
Lev
el
3’ … A C G G G C … … ATG … 5’
3’ … A C G G G C … … ATG … 5’
3’ … A C G G G A … … ATG … 5’
Local Alignment
Search Window
A
C B
?-
0
+
MicroArray
data management
clone - spot
clone - gene
raw expression level
normalized expression level
annotation/links
expression profile
MArray Expt Mgmt Redux
Experiment 5-Tuple:(Probe Set_ID, Target_ID, Hyb Condition_ID, Hyb Iteration_ID, GenePix_Analysis_ID)
Database Support (EBI Schema)
http://www.ebi.ac.uk/arrayexpress/http://www.bioinf.man.ac.uk/microarray/maxd
Differential Expression
• Type I analysis
• Look for genes with vastly different expression under different conditions– How do you measure “vastly different”?– What role should derived statistics play?
Type I: Differential ExpressionGene 1 vs Gene 2
0
10000
20000
30000
40000
50000
60000
0 10000 20000 30000 40000 50000 60000Gene 1
Ge
ne
2
Coordinated Gene Expression
• Type II analysis
• “Eisen”ized data (dendrograms)
• Self-Organizing Maps
• Principal Component Analysis
• k-means Clustering
Hierarchical Clustering
Self Organizing Maps
Current Software
Software Name Provider
Pro
be
Se
t De
sig
n
Qu
an
tiza
tion
Sta
tistic
s
No
rma
liza
tion
Diff E
xp
CG
E
Array Explorer Spotfire Inc X X X XArray Gauge FujiFilm X X X XArrayDesigner Premier Biosoft Intl Inc XArraySCOUT Lion Bioscience X X XArrayStat Imaging Research Inc X X XArrayViewer TIGR X X XArrayVision Imaging Research Inc X XarrayW oRx Applied Precision Inc X X XCluster/Xcluster Stanford University X XCrazy Quant U of W ashington XGeneCluster MIT X XGenePix Pro Axon Instruments X X XGeneSight Biodiscovery X X X XGeneSpring Silicon Genetics X X X XGeneTAC Genomic Solutions X XImagene Biodiscovery X X X XMicroArray Suite Scanalytics X X X XMicromax NEN X X XOmniGrid GeneMachines XPathways Analysis Research Genetics X X X X XQuant Array Packard Instrument Co X X X XScanAlyze Stanford University X X XSeqArray GCG X XSpotfinder TIGR X X XDotsReader Cose X X XResolver Rosetta
Software/Pipeline Integration
• A centralized database facilitates the archival, manipulation, and mining of all microarray data
• Most analysis programs can output data in a textual format which is easily input into the database
• Output from one program can be used as input to a second program either directly or through a filtering operation facilitated by the database and a set of programs to mine and manipulate the data
• Data from multiple hybridizations may need to be combined in order to perform coordinated gene expression analysis
Standards...
Want ability to exchange microarray experiment data using a common format.
MGED -- Microarray Gene Expression Groupwww.mged.org
MAGEML
Rosetta InpharmaticsGEML -- www.geml.org
MIAME - Minimum Information About Microarry Experiments
Data and Limitations
Current Controversy:Should the raw data be archived?
If so, who should do it?
Each slide (25 mm x 75 mm) is scanned at 200 pixels per mm.Typical spot size = 100 um
Center-to-center = 195 um
Potential spots = 42,000
“Raw” image size = ~250 MB
Other Types of Microarrays
• Genomic BAC arrays– allows assessment of “small” deletions
• Tissue arrays– allows assessment of protein expressions
Type II: Data Partitioning
• Identify genes with similar expression
• Grouping unknown genes with known genes may provide insight into function of unknown genes
• Only useful for genes with varying expression levels
Protein Expression
Protein expression may not correlate with mRNA expression.
How to measure levels of protein expression?
Immunochemistry2-antibody approach
Protein Expression
Indirect Immunofluorescence
cells are fixed
permeabilize the cells
incubate with primary antibody
incubate with secondary antibody
Protein Expression
Protein Expression
Immunofluorescence
green -- tubulin
red -- gamma tubulin
blue -- DNA
Protein Expression
Immunofluorescence
red -- alpha tubulin
green -- vimentin (cytoskeletal protein)
blue -- DNA
Protein Expression
High-throughput methods
array multiple tissue samples onto slide, and hybridize