CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background...

30
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology •Technology Background •Data Processing Procedure •Characteristics of Data •Data integration and Data mining

Transcript of CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background...

Page 1: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

CS491JH: Data Mining in Bioinformatics

Introduction to Microarray Technology

•Technology Background

•Data Processing Procedure

•Characteristics of Data

•Data integration and Data mining

Page 2: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Substrates for High Throughput Arrays

Nylon Membrane Glass SlidesGeneChip

Single label P33 Single label biotinstreptavidin

Dual labelCy3, Cy5

Page 3: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

GeneChip® Probe Arrays

24µm24µm

Millions of copies of a specificMillions of copies of a specificoligonucleotide probeoligonucleotide probe

Image of Hybridized Probe ArrayImage of Hybridized Probe Array

>200,000 different>200,000 differentcomplementary probes complementary probes

Single stranded, Single stranded, labeled RNA targetlabeled RNA target

Oligonucleotide probeOligonucleotide probe

**

**

*

1.28cm1.28cm

GeneChipGeneChip Probe ArrayProbe ArrayHybridized Probe CellHybridized Probe Cell

Page 4: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

GeneChip® Expression Array Design

GeneGeneSequenceSequence

Probes designed to be Probes designed to be Perfect MatchPerfect Match

Probes designed to be Probes designed to be MismatchMismatch

Multiple Multiple oligo probesoligo probes

5´5´ 3´3´

Page 5: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Procedures for Target Preparation

cDNAcDNAFragmentFragment(heat, Mg(heat, Mg2+2+))

LL LL LL LL

Wash & StainWash & Stain

ScanScan

HybridizeHybridize

(16 hours)(16 hours)

Labeled transcriptLabeled transcript

Poly (A)Poly (A)++//TotalTotal RNARNA

AAAAAAAA

IVTIVT

(Biotin-UTP(Biotin-UTPBiotin-CTP)Biotin-CTP)

Labeled fragmentsLabeled fragments

LL LL

LL

LL

CellsCells

Page 6: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Microarray Technology

Page 7: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab

Printing Arrays on 50 slides

Page 8: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Cells from condition ACells from condition ACells from condition ACells from condition A Cells from condition BCells from condition BCells from condition BCells from condition B

mRNA

Label Dye 2

NSF / U of IllinoisMicroarray Workshop-Steve Clough / Vodkin Lab

Ratio of expression of genes from two sources

Label Dye 1

cDNA

equal over under

Mix

Totalor

Page 9: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

GSI Lumonics

NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab

Page 10: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Beta Actin

PKG

HPRT

Beta 2 microglobulin

RubiscoAB binding protein

Major latex proteinhomologue (MSG)

Cattle and Soy Controls

Array of cattle and soy spiking controls. 50 ug of cattle brain total RNA was labeled with Cy3 (green).1 ul each of in vitro transcribed soy Rubisco (5 ng), AB binding protein (0.5 ng) and MSG (0.05 ng) were labeled with Cy5. The two labeled samples were cohybridized on superamine slides (Telechem, Inc.). To the right of each set of spots are five negative controls (water).

Page 11: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

IgM

IgM heavy chain

MYLK

COL1A2 COL1A2

MYLK

IgM

Fetal Spleen-Cy3 Adult Spleen-Cy5

IgM heavy chain

Page 12: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Placenta vs. Brain – 3800 Cattle Placenta Array cy3 cy5

GenePix Image Analysis Software

Page 13: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

GeneFilter Comparison Report GeneFilter 1 Name: GeneFilter 1 Name:O2#1 8-20-99adjfinal N2#1finaladj

INTENSITIESRAW NORMALIZED

ORF NAME GENE NAME CHRM F G R GF1 GF2 GF1 GF2 DIFFERENCE RATIOYAL001C TFC3 1 1 A 1 2 12.03 7.38 403.83 209.79 194.04 1.92YBL080C PET112 2 1 A 1 3 53.21 35.62 "1,786.11" "1,013.13" 772.98 1.76YBR154C RPB5 2 1 A 1 4 79.26 78.51 "2,660.73" "2,232.86" 427.87 1.19YCL044C 3 1 A 1 5 53.22 44.66 "1,786.53" "1,270.12" 516.41 1.41YDL020C SON1 4 1 A 1 6 23.80 20.34 799.06 578.42 220.64 1.38YDL211C 4 1 A 1 7 17.31 35.34 581.00 "1,005.18" -424.18 -1.73YDR155C CPH1 4 1 A 1 8 349.78 401.84 "11,741.98" "11,428.10" 313.88 1.03YDR346C 4 1 A 1 9 64.97 65.88 "2,180.87" "1,873.67" 307.21 1.16YAL010C MDM10 1 1 A 2 2 13.73 9.61 461.03 273.36 187.67 1.69YBL088C TEL1 2 1 A 2 3 8.50 7.74 285.38 220.01 65.37 1.30YBR162C 2 1 A 2 4 226.84 293.83 "7,614.82" "8,356.39" -741.57 -1.10YCL052C PBN1 3 1 A 2 5 41.28 34.79 "1,385.79" 989.41 396.38 1.40YDL028C MPS1 4 1 A 2 6 7.95 6.24 266.99 177.34 89.65 1.51YDL219W 4 1 A 2 7 16.08 11.33 539.93 322.20 217.74 1.68YDR163W 4 1 A 2 8 19.13 14.19 642.17 403.56 238.61 1.59YDR354W TRP4 4 1 A 2 9 62.24 40.74 "2,089.48" "1,158.64" 930.84 1.80YAL018C 1 1 A 3 2 10.72 8.81 359.75 250.60 109.15 1.44YBL096C 2 1 A 3 3 10.91 8.98 366.40 255.40 111.00 1.43YBR169C SSE2 2 1 A 3 4 17.33 27.81 581.80 790.84 -209.05 -1.36YCL060C 3 1 A 3 5 17.99 24.75 603.96 703.75 -99.79 -1.17YDL036C 4 1 A 3 6 14.22 8.86 477.39 251.94 225.44 1.89YDL227C HO 4 1 A 3 7 25.61 31.52 859.71 896.46 -36.75 -1.04YDR171W HSP42 4 1 A 3 8 102.08 98.37 "3,426.83" "2,797.58" 629.25 1.22YDR362C 4 1 A 3 9 16.32 12.95 547.96 368.39 179.57 1.49YAL026C DRS2 1 1 A 4 2 11.32 7.97 379.85 226.53 153.33 1.68YBL102W SFT2 2 1 A 4 3 55.88 63.74 "1,875.82" "1,812.81" 63.02 1.03YBR177C 2 1 A 4 4 63.31 29.03 "2,125.20" 825.60 "1,299.60" 2.57YCL068C 3 1 A 4 5 8.33 4.47 279.51 127.16 152.35 2.20YDL044C MTF2 4 1 A 4 6 11.73 6.96 393.88 198.07 195.81 1.99YDL235C YPD1 4 1 A 4 7 38.71 30.20 "1,299.33" 858.83 440.50 1.51YDR179C 4 1 A 4 8 12.77 11.05 428.60 314.12 114.48 1.36YDR370C 4 1 A 4 9 16.70 15.30 560.62 435.13 125.49 1.29YAL034C FUN19 1 1 A 5 2 20.89 24.21 701.32 688.59 12.73 1.02YBL111C 2 1 A 5 3 22.38 13.67 751.39 388.69 362.70 1.93YBR185C MBA1 2 1 A 5 4 38.42 19.96 "1,289.61" 567.78 721.83 2.27YCLX03C 3 1 A 5 5 8.69 3.66 291.77 104.11 187.66 2.80YDL052C SLC1 4 1 A 5 6 52.37 49.87 "1,758.05" "1,418.33" 339.73 1.24YDL243C 4 1 A 5 7 15.56 12.95 522.24 368.30 153.94 1.42YDR186C 4 1 A 5 8 16.48 15.01 553.30 426.75 126.55 1.30YDR378C 4 1 A 5 9 31.13 28.08 "1,045.01" 798.50 246.50 1.31YAL040C CLN3 1 1 A 6 2 126.65 107.34 "4,251.70" "3,052.61" "1,199.08" 1.39YBR006W 2 1 A 6 3 22.74 11.10 763.49 315.55 447.94 2.42YBR193C 2 1 A 6 4 14.81 15.55 497.07 442.20 54.87 1.12YCLX11W 3 1 A 6 5 161.96 175.34 "5,436.86" "4,986.41" 450.44 1.09YDL060W 4 1 A 6 6 29.84 37.13 "1,001.65" "1,055.98" -54.34 -1.05YDR003W 4 1 A 6 7 23.99 23.22 805.48 660.25 145.22 1.22YDR194C MSS116 4 1 A 6 8 66.58 47.16 "2,235.07" "1,341.29" 893.78 1.67YDR386W 4 1 A 6 9 11.27 5.75 378.27 163.46 214.81 2.31YAL047C 1 1 A 7 2 15.54 11.30 521.74 321.28 200.46 1.62YBR012W-B 2 1 A 7 3 54.70 79.97 "1,836.29" "2,274.15" -437.86 -1.24YBR201W DER1 2 1 A 7 4 21.67 19.57 727.49 556.64 170.85 1.31YCR007C 3 1 A 7 5 25.02 15.96 840.01 453.76 386.25 1.85YDL068W 4 1 A 7 6 18.32 13.11 614.83 372.78 242.05 1.65

Page 14: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

1. Experimental Design

2. Image Analysis – raw data

3. Normalization – “clean” data

4. Data Filtering – informative data

5. Model building

6. Data Mining (clustering, pattern recognition, et al)

7. Validation

Microarray Data Process

Page 15: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Scatterplot of Normalized Data

Adult

Fet

al

Page 16: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

>0.3<-0.3

Page 17: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Characteristics of Data

Data can be viewed as a NxM matrix (N >> M):

N is the number of genes

M is the number of data points for each gene

Or Nx(M+K)

K is the number of Features describing each gene(genome location, functional description, metabolic pathway et al)

Page 18: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Model for Data Analysis

•Gene Expression is a Dynamic Process

•Each Microarray Experiment is a snap shot of the process

•Need basic biological knowledge to build model

For Example:

Assumption – In most of experiments, only a small set of genes (100s/1000s) have been affected significantly.

Page 19: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Data Mining

•Data volumes are too large for traditional analysis methods

Large number of records and high dimensional data

•Only small portion of data is analyzed

•Decision support process becomes more complex

Functions of Data Mining

Need for Data Mining

Use the data to build predictors – prediction, classification, deviation detection, segmentation

Generates more sophisticated summaries and reports to aid understanding of the data – find clusters, partitions in data

Page 20: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Data Mining Methods

Classification, Regression (Predictive Modeling)

Clustering (Segmentation)

Association Discovery (Summarization)

Change and deviation detection

Dependency Modeling

Information Visualization

Page 21: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Cholesterol Biosynthesis

Cell Cycle

Immediate Early Response

Signaling and Angiogenesis

Wound Healing and Tissue Remodeling

Clustered display of data from time course of serum stimulation of primary human fibroblasts.

Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) pg 14865

Page 22: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Page 23: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Page 24: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Self Organizing Maps

Page 25: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Molecular Classification of Cancer

Page 26: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Page 27: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Gene Expression Profile of Aging and Its Retardation by Caloric Restriction

Cheol-Koo Lee, Roger G. Klopp, Richard Weindruch, Tomas A. Prolla

Page 28: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Expression Landscape of cell-cycle regulated genes in yeast

Page 29: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.

Multi-dimension data visualization

Page 30: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.