Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental...
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
1
Transcript of Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental...
Microarray AnalysisMicroarray Analysis
It all comes down toIt all comes down to Experimental DesignExperimental Design PreprocessingPreprocessing Data AnalysisData Analysis
Experimental DesignExperimental Design
Elimination of confounding factorsElimination of confounding factors Same cell line, minimal exposureSame cell line, minimal exposure Timing of samplingTiming of sampling
Technological considerationsTechnological considerations Hybridization considerationsHybridization considerations Chip/tag selectionChip/tag selection
Slide to DataSlide to Data
Gene ValueD26528_at 193D26561_cds1_at -70D26561_cds2_at 144D26561_cds3_at 33D26579_at 318D26598_at 1764D26599_at 1537D26600_at 1204D28114_at 707
PreprocessingPreprocessing
Data importData import
Background adjustmentBackground adjustment
NormalizationNormalization
Summarization of multiple probes per Summarization of multiple probes per transcripttranscript
Quality controlQuality control
Data ImportData Import
Incorporate various file formats into Incorporate various file formats into desired data formatsdesired data formats Different vendors have different Different vendors have different
representationsrepresentations Sometimes desired data is not providedSometimes desired data is not provided
Background AdjustmentBackground Adjustment
It all comes down to one word…noiseIt all comes down to one word…noise Optical distortionOptical distortion Non-specific hybridizationNon-specific hybridization Equipment damageEquipment damage
M vs. AM vs. A
M represents differential ratioM represents differential ratioMM = ( = (loglog R – R – loglog G) G)
A represents the fluorescence intensityA represents the fluorescence intensityA = (log A = (log RR + log + log GG)/2)/2
Desirable transformation would show Desirable transformation would show uniform distribution of differential across uniform distribution of differential across intensitiesintensities
NormalizationNormalization
Normalization between samples needs to Normalization between samples needs to be established for a variety of reasonsbe established for a variety of reasons Different reverse transcription efficiency levelsDifferent reverse transcription efficiency levels
We are using PCR to amplify in separate platesWe are using PCR to amplify in separate plates Hybridization inequalitiesHybridization inequalities
Variations in solution used in hybridization reactionVariations in solution used in hybridization reaction Spatial abnormalities between platesSpatial abnormalities between plates
Particularly apparent for in-house platesParticularly apparent for in-house plates
Summarizing DataSummarizing Data
Process of reducing the various samples Process of reducing the various samples into an analysisinto an analysis The crux of microarray analysisThe crux of microarray analysis
Can apply a Can apply a linearlinear or a non linear model or a non linear model using any of the following techniquesusing any of the following techniques Support Vector Machines (SVM)Support Vector Machines (SVM) Neural NetworksNeural Networks Empirical BayesEmpirical Bayes
Quality ControlQuality Control
Concerned with accuracy and Concerned with accuracy and reproducibilityreproducibility Dr. Piatetsy-Shapiro (last week’s colloquium) Dr. Piatetsy-Shapiro (last week’s colloquium)
was primarily concerned with this area of was primarily concerned with this area of microarray analysismicroarray analysis
Detection of errors (x-validation)Detection of errors (x-validation)
Isolation and validation of significant resultsIsolation and validation of significant results
Corrective behaviorCorrective behavior
Time for FunTime for Fun
DatasetDataset ApoAI.RDataApoAI.RData
The apolipoprotein AI (ApoAI) gene is known to play a pivotal The apolipoprotein AI (ApoAI) gene is known to play a pivotal role in high density lipoprotein (HDL) metabolism. Mice which role in high density lipoprotein (HDL) metabolism. Mice which have the ApoAI gene knocked (KO) out have very low HDL have the ApoAI gene knocked (KO) out have very low HDL cholesterol levels.cholesterol levels.
Puprose is to determine how ApoAI deficiency affects the Puprose is to determine how ApoAI deficiency affects the action of other genes in the liveraction of other genes in the liver
Help determine what molecular pathways ApoAI operates onHelp determine what molecular pathways ApoAI operates on
MarkersMarkers
All mRNA data from both knockout and wild-type All mRNA data from both knockout and wild-type were marked were marked GREENGREEN
KO and WT are marked KO and WT are marked REDRED Oftentimes, both populations are run on same plate Oftentimes, both populations are run on same plate
with one being marked with one being marked REDRED and the other marked and the other marked GREENGREEN
RRwww.r-project.orgwww.r-project.org
““S”-like GNU project language and S”-like GNU project language and environment for statistical computingenvironment for statistical computingGreat free package for linear and non-Great free package for linear and non-linear statistical modelinglinear statistical modelingAlso includes:Also includes:
an effective data handling and storage facility, an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, graphical facilities for data analysis and display either on-screen or on hardcopy,
and and a well-developed, simple and effective programming language which includes a well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output conditionals, loops, user-defined recursive functions and input and output facilities. facilities.
BioconductorBioconductorhttp://http://bioconductor.orgbioconductor.org
Open source package for statistical Open source package for statistical analysis of genomic dataanalysis of genomic data
Includes both statistical and graphical Includes both statistical and graphical toolstools
Active project with a constant influx of new Active project with a constant influx of new packagespackages
Does not include more complex analysis Does not include more complex analysis tools at this time (SVM’s, etc.)tools at this time (SVM’s, etc.)