Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome...
-
Upload
byron-carter -
Category
Documents
-
view
221 -
download
0
Transcript of Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome...
Introduction to Microarray
Dr G. P. S. RaghavaDr G. P. S. Raghava
Molecular Biology Molecular Biology Overview Overview
Cell Nucleus
Chromosome
Protein
Gene (DNA)Gene (mRNA), single strand
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein would be more direct, but is currently harder.
Measuring Gene ExpressionMeasuring Gene Expression
(RT)
The GoalsThe Goals Basic UnderstandingBasic Understanding
– Arrays can take a snap shot of which subset of genes in a cell is Arrays can take a snap shot of which subset of genes in a cell is actively making proteins actively making proteins
– Heat shock experimentsHeat shock experiments Medical diagnosisMedical diagnosis
– Microarrays can indicate where mutations lie that might be Microarrays can indicate where mutations lie that might be linked to a disease. Still others are used to determine if a linked to a disease. Still others are used to determine if a person’s genetic profile would make him or her more or less person’s genetic profile would make him or her more or less susceptible to drug side effectssusceptible to drug side effects
– 1999 – A genechip containing 6800 human genes was used 1999 – A genechip containing 6800 human genes was used distinguish between myeloid leukemia and lympholastic distinguish between myeloid leukemia and lympholastic leukemia using a set of 50 genes that have different activity leukemia using a set of 50 genes that have different activity levelslevels
Drug designDrug design– Pharmaceutical firms are in a rush to translate the human Pharmaceutical firms are in a rush to translate the human
genome results into new productsgenome results into new products Potential profits are hugePotential profits are huge First, though, they must figure out what the genes do, how First, though, they must figure out what the genes do, how
they interact, and how they relate to diseases.they interact, and how they relate to diseases.– Evaluation, Specificity, ResponseEvaluation, Specificity, Response
Microarray Potential Microarray Potential ApplicationsApplications
Biological discoveryBiological discovery– new and better molecular diagnosticsnew and better molecular diagnostics– new molecular targets for therapynew molecular targets for therapy– finding and refining biological pathwaysfinding and refining biological pathways
Recent examplesRecent examples– molecular diagnosis of leukemia, breast molecular diagnosis of leukemia, breast
cancer, ...cancer, ...– appropriate treatment for genetic signatureappropriate treatment for genetic signature– potential new drug targetspotential new drug targets
HistoryHistory1980s: antibody-based assay (protein chip?)1980s: antibody-based assay (protein chip?)
~1991: high-density DNA-synthetic ~1991: high-density DNA-synthetic chemistry (Affymetrix/oligo chips)chemistry (Affymetrix/oligo chips)
~1995: microspotting (Stanford Univ/cDNA ~1995: microspotting (Stanford Univ/cDNA chips)chips)
replacing porous surface with solid surfacereplacing porous surface with solid surface
replacing radioactive label with fluorescent replacing radioactive label with fluorescent labellabel
improvement on sensitivityimprovement on sensitivity
What is a DNA What is a DNA Microarray?Microarray?
genes or gene fragments attached to a substrate (glass)
Hybridized slide
Two dyes
Image analyzed
Tens of thousands of spots/genes
=entire genome in 1 experiment
A Revolution in Biology
Gene Expression Gene Expression MicroarraysMicroarrays
The main types of gene expression microarrays:
Short oligonucleotide arrays (Affymetrix);
cDNA or spotted arrays (Brown/Botstein).
Long oligonucleotide arrays (Agilent Inkjet);
Terms/JargonsTerms/Jargons
Stanford/cDNA chipStanford/cDNA chip one slide/experimentone slide/experiment one spotone spot 1 gene => one spot 1 gene => one spot
or few spots(replica)or few spots(replica) control: control spotscontrol: control spots control: two control: two
fluorescent dyes fluorescent dyes (Cy3/Cy5)(Cy3/Cy5)
Affymetrix/oligo Affymetrix/oligo chipchip
one one chip/experimentchip/experiment
one one probe/feature/cellprobe/feature/cell
1 gene => many 1 gene => many probes (20~25 probes (20~25 mers)mers)
control: match and control: match and mismatch cells.mismatch cells.
Affymetrix MicroarraysAffymetrix Microarrays
50um
1.28cm
~107 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM)Raw gene expression is intensity difference: PM - MM
Raw image
DNA MicroarraysDNA Microarrays Each probe consists of thousands of strands of Each probe consists of thousands of strands of
identical oglionucleotidesidentical oglionucleotides– The DNA sequences at each probe represent The DNA sequences at each probe represent
important genes (or parts of genes)important genes (or parts of genes) Printing SystemsPrinting Systems
– Ex: HP, Corning Inc.Ex: HP, Corning Inc.– Printing systems can build lengths of DNA up Printing systems can build lengths of DNA up
to 60 nucleotides longto 60 nucleotides long– 1.28 x 1.28+ cm glass wafer1.28 x 1.28+ cm glass wafer
Each “print head” has a ~100 Each “print head” has a ~100 m m diameter and are separated by ~100 diameter and are separated by ~100 m. m. (( 5,000 – 20,000 probes) 5,000 – 20,000 probes)
Photolithographic ChipsPhotolithographic Chips– Ex: Affymetix Ex: Affymetix – 1.28 x 1.28 cm glass/silicon wafer1.28 x 1.28 cm glass/silicon wafer
24 x 24 24 x 24 m probe site (m probe site ( 500,000 probes) 500,000 probes)– Lengths of DNA up to 25 nucleotides longLengths of DNA up to 25 nucleotides long– Requires a new set of masks for each new Requires a new set of masks for each new
array typearray type
GeneChip
The ProcessThe ProcessCells
Poly-ARNA
AAAA
cDNA
L L L
L
IVT
10% Biotin-labeled UracilAntisense cRNA
L
Fragment (heat, Mg2+)
Labeledfragments
Hybridize Wash/stain Scan
L
(In-vitro Transcription)
Hybridization and StainingHybridization and Staining
LL
GeneChip BiotinLabeled cRNA
+L
L
L
L
L
L
L
L
L
L+
SAPEStreptavidin-phycoerythrin
Hybridized Array
Microarray DataMicroarray Data First, the Problems:First, the Problems:
1.1. The fabrication process is The fabrication process is not error freenot error free
2.2. Probes have a maximum Probes have a maximum length 25-60 nucleotideslength 25-60 nucleotides
3.3. Biologic processes such as Biologic processes such as hybridization are hybridization are stochasticstochastic
4.4. Background light may Background light may skew the fluorescence skew the fluorescence
5.5. How do we decide if/how How do we decide if/how strongly a particular gene strongly a particular gene is being expressed?is being expressed?
Solutions to these problems Solutions to these problems are still in their infancyare still in their infancy
Affymetrix “Gene chip” Affymetrix “Gene chip” systemsystem
Uses 25 base oligos synthesized in place on a Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene)chip (20 pairs of oligos for each gene)
RNA labeled and scanned in a single “color”RNA labeled and scanned in a single “color”– one sample per chipone sample per chip
Can have as many as 20,000 genes on a chipCan have as many as 20,000 genes on a chip Arrays get smaller every year (more genes)Arrays get smaller every year (more genes) Chips are expensiveChips are expensive Proprietary system: “black box” software, can Proprietary system: “black box” software, can
only use their chipsonly use their chips
cDNA Microarray cDNA Microarray TechnologiesTechnologies
Spot cloned cDNAs onto a glass microscope Spot cloned cDNAs onto a glass microscope slideslide– usually PCR amplified segments of plasmidsusually PCR amplified segments of plasmids
Label 2 RNA samples with 2 different colors of Label 2 RNA samples with 2 different colors of flourescent dye - control vs. experimentalflourescent dye - control vs. experimental
Mix two labeled RNAs and hybridize to the chipMix two labeled RNAs and hybridize to the chip Make two scans - one for each colorMake two scans - one for each color Combine the images to calculate ratios of Combine the images to calculate ratios of
amounts of each RNA that bind to each spotamounts of each RNA that bind to each spot
cDNA microarrayscDNA microarrays
Compare the genetic expression in two samples of cellsCompare the genetic expression in two samples of cells
PRINTcDNA from one gene on each spot
SAMPLEScDNA labelled red/green
e.g. treatment / control
normal / tumor tissue
HYBRIDIZE
Add equal amounts of labelled cDNA samples to microarray.
SCAN
Laser Detector
““Long Oligos”Long Oligos”
Like cDNAs, but instead of using a Like cDNAs, but instead of using a cloned gene, design a 40-70 base cloned gene, design a 40-70 base probe to represent each geneprobe to represent each gene
Relies on genome sequence Relies on genome sequence database and bioinformaticsdatabase and bioinformatics
Reduces cross hybridizationReduces cross hybridization Cheaper and possibly more Cheaper and possibly more
sensitive than Affy. systemsensitive than Affy. system
Images from scannerImages from scanner ResolutionResolution
– standard 10standard 10m [currently, max 5m [currently, max 5m]m]– 100100m spot on chip = 10 pixels in diameterm spot on chip = 10 pixels in diameter
Image formatImage format– TIFF (tagged image file format) 16 bit (65’536 levels of TIFF (tagged image file format) 16 bit (65’536 levels of
grey)grey)– 1cm x 1cm image at 16 bit = 2Mb (uncompressed)1cm x 1cm image at 16 bit = 2Mb (uncompressed)– other formats exist e.g.. SCN (used at Stanford University)other formats exist e.g.. SCN (used at Stanford University)
Separate image for each fluorescent sampleSeparate image for each fluorescent sample– channel 1, channel 2, etc.channel 1, channel 2, etc.
Processing of imagesProcessing of images Addressing or griddingAddressing or gridding
– Assigning coordinates to each of the spotsAssigning coordinates to each of the spots
SegmentationSegmentation– Classification of pixels either as foreground Classification of pixels either as foreground
or as backgroundor as background
Intensity determination for each spotIntensity determination for each spot– Foreground fluorescence intensity pairs (R, Foreground fluorescence intensity pairs (R,
G)G)– Background intensitiesBackground intensities– Quality measuresQuality measures
Images in analysis Images in analysis softwaresoftware
The two 16-bit images (Cy3, Cy5) are compressed The two 16-bit images (Cy3, Cy5) are compressed into 8-bit imagesinto 8-bit images
Display fluorescence intensities for both Display fluorescence intensities for both wavelengths using a 24-bit RGB overlay imagewavelengths using a 24-bit RGB overlay image
RGB image :RGB image :– Blue values (B) are set to 0 Blue values (B) are set to 0 – Red values (R) are used for Cy5 intensitiesRed values (R) are used for Cy5 intensities– Green values (G) are used for Cy3 intensitiesGreen values (G) are used for Cy3 intensities
Qualitative representation of resultsQualitative representation of results
Images : examplesImages : examples
Cy3
Cy5 Spot colourSpot colour Signal strengthSignal strength Gene Gene expressionexpression
yellowyellow Control = perturbedControl = perturbed unchangedunchanged
redred Control < perturbedControl < perturbed inducedinduced
greengreen Control > perturbedControl > perturbed repressedrepressed
Pseudo-colour overlay
Quantification of expressionQuantification of expression
For each spot on the slide we calculateFor each spot on the slide we calculate
Red intensity = Rfg - RbgRed intensity = Rfg - Rbg
(fg = foreground, bg = background) and(fg = foreground, bg = background) and
Green intensity = Gfg - GbgGreen intensity = Gfg - Gbg
and combine them in the log (base 2) ratioand combine them in the log (base 2) ratio
LogLog22(( Red intensityRed intensity / / Green Green intensityintensity))
Gene Expression DataGene Expression Data On p genes for n slides: p is O(10,000), n is On p genes for n slides: p is O(10,000), n is
O(10-100), but growing,O(10-100), but growing,
Genes
Slides
Gene expression level of gene 5 in slide 4
= Log2( Red intensity / Green intensity)
slide 1 slide 2 slide 3 slide 4 slide 5 …
1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...
These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.
Biological questionDifferentially expressed genesSample class prediction etc.
Testing
Biological verification and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
R, G
16-bit TIFF files
(Rfg, Rbg), (Gfg, Gbg)
Quality control (-> Flag)Quality control (-> Flag) How good are foreground and background How good are foreground and background
measurements ?measurements ?– Variability measures in pixel values within each spot Variability measures in pixel values within each spot
maskmask– Spot sizeSpot size– Circularity measureCircularity measure– Relative signal to background intensityRelative signal to background intensity– Dapple:Dapple:
b-value : fraction of background intensities less than the b-value : fraction of background intensities less than the median foreground intensitymedian foreground intensity
p-score : extend to which the position of a spot deviates p-score : extend to which the position of a spot deviates from a rigid rectangular gridfrom a rigid rectangular grid
Flag spots based on these criteriaFlag spots based on these criteria
ReplicationReplication Why?Why?
• To reduce variabilityTo reduce variability• To increase generalizabilityTo increase generalizability
What is it?What is it?• Duplicate spotsDuplicate spots• Duplicate slidesDuplicate slides
Technical replicatesTechnical replicates Biological replicatesBiological replicates
Practical Application of DNA Practical Application of DNA MicroarraysMicroarrays
DNA Microarrays are used to study gene activity (expression)DNA Microarrays are used to study gene activity (expression)– What proteins are being actively produced by a group of cells?What proteins are being actively produced by a group of cells?
““Which genes are being expressed?”Which genes are being expressed?”
How?How?– When a cell is making a protein, it translates the genes (made When a cell is making a protein, it translates the genes (made
of DNA) which code for the protein into RNA used in its of DNA) which code for the protein into RNA used in its productionproduction
– The RNA present in a cell can be extractedThe RNA present in a cell can be extracted– If a gene has been expressed in a cellIf a gene has been expressed in a cell
RNA will bind to “a copy of itself” on the arrayRNA will bind to “a copy of itself” on the array RNA with no complementary site will wash off the arrayRNA with no complementary site will wash off the array
– The RNA can be “tagged” with a fluorescent dye to determine The RNA can be “tagged” with a fluorescent dye to determine its presenceits presence
DNA microarrays provide a high throughput technique for DNA microarrays provide a high throughput technique for quantifying the presence of specific RNA sequencesquantifying the presence of specific RNA sequences
Analysis and Management of Analysis and Management of Microarray DataMicroarray Data
Magnitude of DataMagnitude of Data– ExperimentsExperiments
50 000 genes in human50 000 genes in human 320 cell types320 cell types 2000 compunds2000 compunds 3 times points3 times points 2 concentrations2 concentrations 2 replicates2 replicates
– Data VolumeData Volume 4*104*1011 11 data-pointsdata-points 10101515 = 1 petaB of Data = 1 petaB of Data
ThanksThanks