Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome...

32
Introduction to Microarray Dr G. P. S. Raghava Dr G. P. S. Raghava

Transcript of Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome...

Page 1: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Introduction to Microarray

Dr G. P. S. RaghavaDr G. P. S. Raghava

Page 2: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Molecular Biology Molecular Biology Overview Overview

Cell Nucleus

Chromosome

Protein

Gene (DNA)Gene (mRNA), single strand

Page 3: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein would be more direct, but is currently harder.

Measuring Gene ExpressionMeasuring Gene Expression

Page 4: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

(RT)

Page 5: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

The GoalsThe Goals Basic UnderstandingBasic Understanding

– Arrays can take a snap shot of which subset of genes in a cell is Arrays can take a snap shot of which subset of genes in a cell is actively making proteins actively making proteins

– Heat shock experimentsHeat shock experiments Medical diagnosisMedical diagnosis

– Microarrays can indicate where mutations lie that might be Microarrays can indicate where mutations lie that might be linked to a disease. Still others are used to determine if a linked to a disease. Still others are used to determine if a person’s genetic profile would make him or her more or less person’s genetic profile would make him or her more or less susceptible to drug side effectssusceptible to drug side effects

– 1999 – A genechip containing 6800 human genes was used 1999 – A genechip containing 6800 human genes was used distinguish between myeloid leukemia and lympholastic distinguish between myeloid leukemia and lympholastic leukemia using a set of 50 genes that have different activity leukemia using a set of 50 genes that have different activity levelslevels

Drug designDrug design– Pharmaceutical firms are in a rush to translate the human Pharmaceutical firms are in a rush to translate the human

genome results into new productsgenome results into new products Potential profits are hugePotential profits are huge First, though, they must figure out what the genes do, how First, though, they must figure out what the genes do, how

they interact, and how they relate to diseases.they interact, and how they relate to diseases.– Evaluation, Specificity, ResponseEvaluation, Specificity, Response

Page 6: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Microarray Potential Microarray Potential ApplicationsApplications

Biological discoveryBiological discovery– new and better molecular diagnosticsnew and better molecular diagnostics– new molecular targets for therapynew molecular targets for therapy– finding and refining biological pathwaysfinding and refining biological pathways

Recent examplesRecent examples– molecular diagnosis of leukemia, breast molecular diagnosis of leukemia, breast

cancer, ...cancer, ...– appropriate treatment for genetic signatureappropriate treatment for genetic signature– potential new drug targetspotential new drug targets

Page 7: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

HistoryHistory1980s: antibody-based assay (protein chip?)1980s: antibody-based assay (protein chip?)

~1991: high-density DNA-synthetic ~1991: high-density DNA-synthetic chemistry (Affymetrix/oligo chips)chemistry (Affymetrix/oligo chips)

~1995: microspotting (Stanford Univ/cDNA ~1995: microspotting (Stanford Univ/cDNA chips)chips)

replacing porous surface with solid surfacereplacing porous surface with solid surface

replacing radioactive label with fluorescent replacing radioactive label with fluorescent labellabel

improvement on sensitivityimprovement on sensitivity

Page 8: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

What is a DNA What is a DNA Microarray?Microarray?

genes or gene fragments attached to a substrate (glass)

Hybridized slide

Two dyes

Image analyzed

Tens of thousands of spots/genes

=entire genome in 1 experiment

A Revolution in Biology

Page 9: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Gene Expression Gene Expression MicroarraysMicroarrays

The main types of gene expression microarrays:

Short oligonucleotide arrays (Affymetrix);

cDNA or spotted arrays (Brown/Botstein).

Long oligonucleotide arrays (Agilent Inkjet);

Page 10: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Terms/JargonsTerms/Jargons

Stanford/cDNA chipStanford/cDNA chip one slide/experimentone slide/experiment one spotone spot 1 gene => one spot 1 gene => one spot

or few spots(replica)or few spots(replica) control: control spotscontrol: control spots control: two control: two

fluorescent dyes fluorescent dyes (Cy3/Cy5)(Cy3/Cy5)

Affymetrix/oligo Affymetrix/oligo chipchip

one one chip/experimentchip/experiment

one one probe/feature/cellprobe/feature/cell

1 gene => many 1 gene => many probes (20~25 probes (20~25 mers)mers)

control: match and control: match and mismatch cells.mismatch cells.

Page 11: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Affymetrix MicroarraysAffymetrix Microarrays

50um

1.28cm

~107 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM)Raw gene expression is intensity difference: PM - MM

Raw image

Page 12: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

DNA MicroarraysDNA Microarrays Each probe consists of thousands of strands of Each probe consists of thousands of strands of

identical oglionucleotidesidentical oglionucleotides– The DNA sequences at each probe represent The DNA sequences at each probe represent

important genes (or parts of genes)important genes (or parts of genes) Printing SystemsPrinting Systems

– Ex: HP, Corning Inc.Ex: HP, Corning Inc.– Printing systems can build lengths of DNA up Printing systems can build lengths of DNA up

to 60 nucleotides longto 60 nucleotides long– 1.28 x 1.28+ cm glass wafer1.28 x 1.28+ cm glass wafer

Each “print head” has a ~100 Each “print head” has a ~100 m m diameter and are separated by ~100 diameter and are separated by ~100 m. m. (( 5,000 – 20,000 probes) 5,000 – 20,000 probes)

Photolithographic ChipsPhotolithographic Chips– Ex: Affymetix Ex: Affymetix – 1.28 x 1.28 cm glass/silicon wafer1.28 x 1.28 cm glass/silicon wafer

24 x 24 24 x 24 m probe site (m probe site ( 500,000 probes) 500,000 probes)– Lengths of DNA up to 25 nucleotides longLengths of DNA up to 25 nucleotides long– Requires a new set of masks for each new Requires a new set of masks for each new

array typearray type

GeneChip

Page 13: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

The ProcessThe ProcessCells

Poly-ARNA

AAAA

cDNA

L L L

L

IVT

10% Biotin-labeled UracilAntisense cRNA

L

Fragment (heat, Mg2+)

Labeledfragments

Hybridize Wash/stain Scan

L

(In-vitro Transcription)

Page 14: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Hybridization and StainingHybridization and Staining

LL

GeneChip BiotinLabeled cRNA

+L

L

L

L

L

L

L

L

L

L+

SAPEStreptavidin-phycoerythrin

Hybridized Array

Page 15: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Microarray DataMicroarray Data First, the Problems:First, the Problems:

1.1. The fabrication process is The fabrication process is not error freenot error free

2.2. Probes have a maximum Probes have a maximum length 25-60 nucleotideslength 25-60 nucleotides

3.3. Biologic processes such as Biologic processes such as hybridization are hybridization are stochasticstochastic

4.4. Background light may Background light may skew the fluorescence skew the fluorescence

5.5. How do we decide if/how How do we decide if/how strongly a particular gene strongly a particular gene is being expressed?is being expressed?

Solutions to these problems Solutions to these problems are still in their infancyare still in their infancy

Page 16: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Affymetrix “Gene chip” Affymetrix “Gene chip” systemsystem

Uses 25 base oligos synthesized in place on a Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene)chip (20 pairs of oligos for each gene)

RNA labeled and scanned in a single “color”RNA labeled and scanned in a single “color”– one sample per chipone sample per chip

Can have as many as 20,000 genes on a chipCan have as many as 20,000 genes on a chip Arrays get smaller every year (more genes)Arrays get smaller every year (more genes) Chips are expensiveChips are expensive Proprietary system: “black box” software, can Proprietary system: “black box” software, can

only use their chipsonly use their chips

Page 17: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

cDNA Microarray cDNA Microarray TechnologiesTechnologies

Spot cloned cDNAs onto a glass microscope Spot cloned cDNAs onto a glass microscope slideslide– usually PCR amplified segments of plasmidsusually PCR amplified segments of plasmids

Label 2 RNA samples with 2 different colors of Label 2 RNA samples with 2 different colors of flourescent dye - control vs. experimentalflourescent dye - control vs. experimental

Mix two labeled RNAs and hybridize to the chipMix two labeled RNAs and hybridize to the chip Make two scans - one for each colorMake two scans - one for each color Combine the images to calculate ratios of Combine the images to calculate ratios of

amounts of each RNA that bind to each spotamounts of each RNA that bind to each spot

Page 18: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

cDNA microarrayscDNA microarrays

Compare the genetic expression in two samples of cellsCompare the genetic expression in two samples of cells

PRINTcDNA from one gene on each spot

SAMPLEScDNA labelled red/green

e.g. treatment / control

normal / tumor tissue

Page 19: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

HYBRIDIZE

Add equal amounts of labelled cDNA samples to microarray.

SCAN

Laser Detector

Page 20: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

““Long Oligos”Long Oligos”

Like cDNAs, but instead of using a Like cDNAs, but instead of using a cloned gene, design a 40-70 base cloned gene, design a 40-70 base probe to represent each geneprobe to represent each gene

Relies on genome sequence Relies on genome sequence database and bioinformaticsdatabase and bioinformatics

Reduces cross hybridizationReduces cross hybridization Cheaper and possibly more Cheaper and possibly more

sensitive than Affy. systemsensitive than Affy. system

Page 21: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Images from scannerImages from scanner ResolutionResolution

– standard 10standard 10m [currently, max 5m [currently, max 5m]m]– 100100m spot on chip = 10 pixels in diameterm spot on chip = 10 pixels in diameter

Image formatImage format– TIFF (tagged image file format) 16 bit (65’536 levels of TIFF (tagged image file format) 16 bit (65’536 levels of

grey)grey)– 1cm x 1cm image at 16 bit = 2Mb (uncompressed)1cm x 1cm image at 16 bit = 2Mb (uncompressed)– other formats exist e.g.. SCN (used at Stanford University)other formats exist e.g.. SCN (used at Stanford University)

Separate image for each fluorescent sampleSeparate image for each fluorescent sample– channel 1, channel 2, etc.channel 1, channel 2, etc.

Page 22: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Processing of imagesProcessing of images Addressing or griddingAddressing or gridding

– Assigning coordinates to each of the spotsAssigning coordinates to each of the spots

SegmentationSegmentation– Classification of pixels either as foreground Classification of pixels either as foreground

or as backgroundor as background

Intensity determination for each spotIntensity determination for each spot– Foreground fluorescence intensity pairs (R, Foreground fluorescence intensity pairs (R,

G)G)– Background intensitiesBackground intensities– Quality measuresQuality measures

Page 23: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Images in analysis Images in analysis softwaresoftware

The two 16-bit images (Cy3, Cy5) are compressed The two 16-bit images (Cy3, Cy5) are compressed into 8-bit imagesinto 8-bit images

Display fluorescence intensities for both Display fluorescence intensities for both wavelengths using a 24-bit RGB overlay imagewavelengths using a 24-bit RGB overlay image

RGB image :RGB image :– Blue values (B) are set to 0 Blue values (B) are set to 0 – Red values (R) are used for Cy5 intensitiesRed values (R) are used for Cy5 intensities– Green values (G) are used for Cy3 intensitiesGreen values (G) are used for Cy3 intensities

Qualitative representation of resultsQualitative representation of results

Page 24: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Images : examplesImages : examples

Cy3

Cy5 Spot colourSpot colour Signal strengthSignal strength Gene Gene expressionexpression

yellowyellow Control = perturbedControl = perturbed unchangedunchanged

redred Control < perturbedControl < perturbed inducedinduced

greengreen Control > perturbedControl > perturbed repressedrepressed

Pseudo-colour overlay

Page 25: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Quantification of expressionQuantification of expression

For each spot on the slide we calculateFor each spot on the slide we calculate

Red intensity = Rfg - RbgRed intensity = Rfg - Rbg

(fg = foreground, bg = background) and(fg = foreground, bg = background) and

Green intensity = Gfg - GbgGreen intensity = Gfg - Gbg

and combine them in the log (base 2) ratioand combine them in the log (base 2) ratio

LogLog22(( Red intensityRed intensity / / Green Green intensityintensity))

Page 26: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Gene Expression DataGene Expression Data On p genes for n slides: p is O(10,000), n is On p genes for n slides: p is O(10,000), n is

O(10-100), but growing,O(10-100), but growing,

Genes

Slides

Gene expression level of gene 5 in slide 4

= Log2( Red intensity / Green intensity)

slide 1 slide 2 slide 3 slide 4 slide 5 …

1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.

Page 27: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Biological questionDifferentially expressed genesSample class prediction etc.

Testing

Biological verification and interpretation

Microarray experiment

Estimation

Experimental design

Image analysis

Normalization

Clustering Discrimination

R, G

16-bit TIFF files

(Rfg, Rbg), (Gfg, Gbg)

Page 28: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Quality control (-> Flag)Quality control (-> Flag) How good are foreground and background How good are foreground and background

measurements ?measurements ?– Variability measures in pixel values within each spot Variability measures in pixel values within each spot

maskmask– Spot sizeSpot size– Circularity measureCircularity measure– Relative signal to background intensityRelative signal to background intensity– Dapple:Dapple:

b-value : fraction of background intensities less than the b-value : fraction of background intensities less than the median foreground intensitymedian foreground intensity

p-score : extend to which the position of a spot deviates p-score : extend to which the position of a spot deviates from a rigid rectangular gridfrom a rigid rectangular grid

Flag spots based on these criteriaFlag spots based on these criteria

Page 29: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

ReplicationReplication Why?Why?

• To reduce variabilityTo reduce variability• To increase generalizabilityTo increase generalizability

What is it?What is it?• Duplicate spotsDuplicate spots• Duplicate slidesDuplicate slides

Technical replicatesTechnical replicates Biological replicatesBiological replicates

Page 30: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Practical Application of DNA Practical Application of DNA MicroarraysMicroarrays

DNA Microarrays are used to study gene activity (expression)DNA Microarrays are used to study gene activity (expression)– What proteins are being actively produced by a group of cells?What proteins are being actively produced by a group of cells?

““Which genes are being expressed?”Which genes are being expressed?”

How?How?– When a cell is making a protein, it translates the genes (made When a cell is making a protein, it translates the genes (made

of DNA) which code for the protein into RNA used in its of DNA) which code for the protein into RNA used in its productionproduction

– The RNA present in a cell can be extractedThe RNA present in a cell can be extracted– If a gene has been expressed in a cellIf a gene has been expressed in a cell

RNA will bind to “a copy of itself” on the arrayRNA will bind to “a copy of itself” on the array RNA with no complementary site will wash off the arrayRNA with no complementary site will wash off the array

– The RNA can be “tagged” with a fluorescent dye to determine The RNA can be “tagged” with a fluorescent dye to determine its presenceits presence

DNA microarrays provide a high throughput technique for DNA microarrays provide a high throughput technique for quantifying the presence of specific RNA sequencesquantifying the presence of specific RNA sequences

Page 31: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

Analysis and Management of Analysis and Management of Microarray DataMicroarray Data

Magnitude of DataMagnitude of Data– ExperimentsExperiments

50 000 genes in human50 000 genes in human 320 cell types320 cell types 2000 compunds2000 compunds 3 times points3 times points 2 concentrations2 concentrations 2 replicates2 replicates

– Data VolumeData Volume 4*104*1011 11 data-pointsdata-points 10101515 = 1 petaB of Data = 1 petaB of Data

Page 32: Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand.

ThanksThanks