CAP 5510: Introduction to Bioinformatics

3/3/08 CAP5510 1

Giri NarasimhanECS 254; Phone: x3748

giri@cis.fiu.edu

www.cis.fiu.edu/~giri/teach/BioinfS07.html

3/3/08 CAP5510 2

Gene g

Probe 1 Probe 2 Probe N…

3/3/08 CAP5510 3

Microarray Data

Expression LevelGene

3/3/08 CAP5510 4

Sample

Treated Sample(t1) Expt 1

Treated Sample(tn) Expt n

Study effect of treatment over time

3/3/08 CAP5510 5

http://www.arabidopsis.org/info/2010_projects/comp_proj/AFGC/RevisedAFGC/Friday/

3/3/08 CAP5510 6

How to compare 2 cell samples with Two-ColorMicroarrays?

mRNA from sample 1 is extracted and labeled with a red

fluorescent dye.

mRNA from sample 2 is extracted and labeled with a green

fluorescent dye.

Mix the samples and apply it to every spot on the

microarray. Hybridize sample mixture to probes.

Use optical detector to measure the amount of green and

red fluorescence at each spot.

3/3/08 CAP5510 7

Variations in cells/individuals

Variations in mRNA extraction, isolation, introduction of dye, variationin dye incorporation, dye interference

Variations in probe concentration, probe amounts, substrate surfacecharacteristics

Variations in hybridization conditions and kinetics

Variations in optical measurements, spot misalignments, discretizationeffects, noise due to scanner lens and laser irregularities

Cross-hybridization of sequences with high sequence identity

Limit of factor 2 in precision of results

Variation changes with intensity: larger variation at low or highexpression levels

Sources of Variations & Experimental Errors

Need to Normalize data

3/3/08 CAP5510 8

ClusteringClustering is a general method to study patterns in

gene expressions.

Several known methods:

Hierarchical Clustering (Bottom-Up Approach)

K-means Clustering (Top-Down Approach)

Self-Organizing Maps (SOM)

3/3/08 CAP5510 9

Hierarchical Clustering: Example

0 1 2 3 4 5 6 7 8 9 10

3/3/08 CAP5510 10

A Dendrogram

3/3/08 CAP5510 11

Hierarchical Clustering [Johnson, SC, 1967]

Given n points in Rd, compute the distance betweenevery pair of points

While (not done)Pick closest pair of points si and sj and make them part ofthe same cluster.

Replace the pair by an average of the two sij

Try the applet at:http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html

3/3/08 CAP5510 12

Distance Metrics

For clustering, define a distance function:

Euclidean distance metrics

Pearson correlation coefficient

k=2: Euclidean Distance

kiik YXYXD

)(),( ==

xyYYXX

1-1 xy 1

3/3/08 CAP5510 13

3/3/08 CAP5510 14

Clustering of gene expressions

Represent each gene as a vector or a point in d-

space where d is the number of arrays or

experiments being analyzed.

3/3/08 CAP5510 15

From Eisen MB, et al, PNAS 1998 95(25):14863-8

Clustering Random vs. Biological

3/3/08 CAP5510 16

K-Means Clustering: Example

Example from Andrew Moore’s tutorial on Clustering.

3/3/08 CAP5510 17

3/3/08 CAP5510 18

3/3/08 CAP5510 19

3/3/08 CAP5510 20

3/3/08 CAP5510 21

K-Means Clustering [McQueen 67]

RepeatStart with randomly chosen cluster centers

Assign points to give greatest increase in score

Recompute cluster centers

Reassign points

until (no changes)Try the applet at:

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html

3/3/08 CAP5510 22

Comparisons

Hierarchical clusteringNumber of clusters not preset.

Complete hierarchy of clusters

Not very robust, not very efficient.

K-MeansNeed definition of a mean. Categorical data?

More efficient and often finds optimum clustering.

3/3/08 CAP5510 23

Functionally related

genes behave similarly

across experiments

3/3/08 CAP5510 24

Self-Organizing Maps [Kohonen]

Kind of neural network.

Clusters data and find complex relationships

between clusters.

Helps reduce the dimensionality of the data.

Map of 1 or 2 dimensions produced.

Unsupervised Clustering

Like K-Means, except for visualization

3/3/08 CAP5510 25

SOM Architectures

2-D Grid

3-D Grid

Hexagonal Grid

3/3/08 CAP5510 26

SOM Algorithm

Select SOM architecture, and initialize weight

vectors and other parameters.

While (stopping condition not satisfied) do for each

input point x

winning node q has weight vector closest to x.

Update weight vector of q and its neighbors.

Reduce neighborhood size and learning rate.

3/3/08 CAP5510 27

SOM Algorithm Details

Distance between x and weight vector:

Winning node:

Weight update function (for neighbors):

Learning rate:

)]()()[,,()()1( kwkxixkkwkw iii +=+ μ

wxxq = min)(

exp)(),,( 0μxqi rr

3/3/08 CAP5510 28

World Bank Statistics

Data: World Bank statistics of countries in 1992.

39 indicators considered e.g., health, nutrition,

educational services, etc.

The complex joint effect of these factors can can

be visualized by organizing the countries using the

self-organizing map.

3/3/08 CAP5510 29

World Poverty PCA

3/3/08 CAP5510 30

World Poverty SOM

3/3/08 CAP5510 31

World Poverty Map

3/3/08 CAP5510 32

3/3/08 CAP5510 33

3/3/08 CAP5510 34

Viewing SOM Clusters on PCA axes

3/3/08 CAP5510 35

2 3 4 5 6

SOM Example [Xiao-rui He]

3/3/08 CAP5510 36

Neural Networks

Input X

Synaptic

Weights W

ƒ(•)

Output y

3/3/08 CAP5510 37

Learning NN

Adaptive Algorithm

Input X

Weights W

Desired Response

3/3/08 CAP5510 38

Types of NNs

Recurrent NN

Feed-forward NN

Layered

Other issuesHidden layers possible

Different activation functions possible

3/3/08 CAP5510 39

Application: Secondary Structure Prediction

CAP 5510: Introduction to Bioinformatics

Documents

Transcript of CAP 5510: Introduction to Bioinformatics

CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

Daewoo Rptv Ds j4710cra 5510

232 Service Manual -Aspire 5510

Aspire 5510

ADAM 5510 Manual

5510 NE Elam Young Parkway

DSE Model 5510 Autostart Control and Instrumentation ... · DSE Model 5510 Autostart Control and Instrumentation System Operators Manual 2 INTRODUCTION The DSE 5510 Module has been

Deep Sea 5510 Manual

Configuración Firewall - CISCO ASA 5510

Rt2g 5510 00 Stair Detail

CAP 5510 Lecture 1 Introductionsuchen/cap5510fall/ChenLecture...0 2,000,000 4,000,000 6,000,000 8,000,000 10,000,000 12,000,000 1965 1970 1975 1980 1985 1990 1995 2000 National Library

5506 5510.output

SECNAVINST 5510

Adam 5510 Tcp

DISTRICT 5510 TODAY - Home Page | ClubRunner July 09.pdf · Former District 5510 First ... Byron Harrington, Chair ... DISTRICT 5510 TODAY Page 4 WHY NOT TOOT YOUR CLUB’S HORN DURING

CAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools

F-5510 for online viewing.pdf

Sequencing and Mapping CAP 5937-01 Bioinformatics Fall 2004 Amar Mukherjee.

BT 5510 Digital Cordless Phone

5510 0101 Mk3 Product Information