Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster...

41
Fuzzy K means
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    3

Transcript of Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster...

Page 1: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means

Page 2: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means

● A gene can be assigned to several clusters● Each gene is assigned to a cluster with a

membership value between 0 and 1● The membership values of a gene add up to one● Genes with lower membership values are not

well represented by the cluster centroid● Expression of genes with high membership

values are close to cluster centroid

Page 3: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Centroid

• During the centroid refinement in each clustering cycle, new centroids were calculated on the basis of the weighted mean of all the gene –expression patterns in the data set according to

Page 4: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Membership Function

• Each gene’s membership m (a continuous variable from 0 to 1) is defined

as:

Page 5: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means

• The gene weight is (only on the seconed and the third round) empirically defined as:

Where

is the Pearson Correlation between Xi and Xn and is the correlation cutoff

ni xCx ,

x

Page 6: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means

• In each clustering cycle , the centroids were iteratively refined until the average change was <0.001.

• Around 85 % of the centroids , stabilized within approximately 15 iterations , some of centroids required more : about 40 -60 iterations before stabilizing.

Page 7: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means

• After each clustering cycle , each centroid was compared to all other centroids in the set , and centroid pairs correlated >0.9

were replaced by their average .

Page 8: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Visualization Tools

Page 9: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Cells respond to environment

Heat

FoodSupply

Responds toenvironmentalconditions

Various external messages

Page 10: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Genome is fixed – Cells are dynamic

• A genome is static

– Every cell in our body has a copy of same genome

• A cell is dynamic– Responds to external conditions– Saccharomyces cerevisiae cells follow a cell cycle of

division and also budding.

• Cells differentiate during development

Page 11: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Gene regulation

• Gene regulation is responsible for dynamic cell

• Gene expression varies according to:

– Cell type– External conditions

Page 12: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Transcription Factors Binding to DNA

• Transcription regulation:

• Certain transcription factors bind DNA

• Binding recognizes DNA substrings:

• Regulatory motifs

Page 13: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Regulation of Genes

GeneRegulatory Element

RNA polymerase(Protein)

Transcription Factor(Protein)

DNA

Page 14: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Regulation of Genes

Gene

RNA polymerase

Transcription Factor(Protein)

Regulatory Element

DNA

Page 15: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Regulation of Genes

Gene

RNA polymerase

Transcription Factor

Regulatory Element

DNA

New protein

Page 16: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

The Challenges of Gene Expression Data

• Many genes have expression data patterns that are similar to multiple, distinct gene groups.

Page 17: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Results of Clustering Gene Expression

• CLUSTER is simple and easy to use

• De facto standard for microarray analysis

• Limitations:– Hierarchical and other

method clustering in general is not robust

– Genes may belong to more than one cluster

Page 18: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

• Gene can be co expressed with different gene groups in response to different conditions.

Page 19: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Saccharomyces cerevisiae

• The yeast Saccharomyces cerevisiae possesses sophisticated mechanisms to choreograph the expression of its 6200 genes in order to thrive or at list to survive in a wide range of environmental conditions.

Page 20: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

• The gene expression of 40 Yap1p targets, these genes were coordinately induced in responds to subset of conditions shown here ( labeled in red)

Page 21: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

What is a microarray

Page 22: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

What is a microarray (2)

• A 2D array of DNA sequences from thousands of genes

• Each spot has many copies of same gene

• Allow mRNAs from a sample to hybridize

• Measure number of hybridizations per spot

Page 23: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Goal of Microarray Experiments

• Measure level of gene expression across many different conditions:

– Expression Matrix M: {genes}{conditions}:

Mij = |genei| in conditionj

• Deduce gene function– Genes with similar function are expressed under

similar conditions

Page 24: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K-Means clustering

• Each gene can belong to many clusters

• Soft (fuzzy) assignment of genes to clusters– Each gene has 1.0 membership units, allocated amongst clusters

based on correlation with means

• Cluster means are calculated by taking the weighted average of all the genes in the cluster

Page 25: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K-Means clustering

Algorithm:

• Use PCA to initialize cluster means

• 3 iterations of fuzzy k-means clustering, find k/3 clusters per iteration– In each iteration, start with brand new clusters and

initializations

• And a few more heuristic tricks

Page 26: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Initialization

• Use PCA to find a few eigenvectors for initialization

• These features capture the directions of maximum variance

• Must be orthonormal

Page 27: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Example

Initialization• k/3 centroids

defined from k/3 first eigenvectors

Page 28: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Example

• First iteration of clustering

Page 29: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Iteration of the approach

• Remove genes that have a Pearson Correlation with a particular cluster greater than 0.7– Intuition: These strong

signal from these genes has been accounted for

• Repeat

Page 30: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Removing Duplicate Centroids

• Centroids with Pearson correlation > 0.9 will be averaged.

• Allows selecting a large initial number of clusters, since duplicates will be removed

Page 31: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Repeat 3 times

Output

1) Cluster means

2) Gene assignments to clusters

Page 32: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.
Page 33: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.
Page 34: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.
Page 35: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

• Regulatory systems that govern the expression of overlapping sets of genes in yeast.

Page 36: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means ADVANTAGES

• The method can present overlapping clusters , revealing distinct features of each gene’s function and regulation.

• The resulting implication can be used to assign refined hypothetical functions to uncharacterized gene products and additional cellular roles of well none studied proteins .

Page 37: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means ADVANTAGES

• It present more comprehensive groups of conditionally co regulate genes.

• It elucidate the environmental conditions that trigger changes in gene expression.

• It requires no a priori information about the dataset.

Page 38: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Fuzzy K means DISADVANTAGES

• Assignment of genes to the cluster requires a user – defined cutoff and selecting meaningful cutoff is a challenge.

• Fuzzy K means failed to identify a small number of groups that were identified by hierarchical clustering.

Page 39: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

My opinion

• The unique advantages of fuzzy K means clustering make the technique a valuable tool for gene expression analysis , it’s flexibility can be used to reveal more complex correlations between gene expression patterns, promoting refined hypotheses of the role and regulation of gene expression changes.

Page 40: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

• In order to get over the limitations… combining hierarchical clustering with fuzzy K means can be useful..

Page 41: Fuzzy K means. ● A gene can be assigned to several clusters ● Each gene is assigned to a cluster with a membership value between 0 and 1 ● The membership.

Thank you !