Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa...

Faithful Sampling for Spectral Clustering to Analyze High

Throughput Flow Cytometry Data

Parisa Shooshtari

School of Computing Science, Simon Fraser University, Burnaby

Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver

Outline:

• Flow Cytometry (FCM) Data• Clustering of FCM data• Spectral Clustering• Faithful Sampling for Spectral Clustering• Result• Summary

Basics of Flow Cytometry Technique

3

Sample

Wave Length

Wave Length

Inte

nsity

Inte

nsity

MHC-II

MHC-II

MHC-II

MHC-II

CD-11c

CD-11c

Int-1

Int-2

CD-11c

MHC-IIInt-1Int-2

Cell Population Identification in Flow Cytometry (FCM)

X%

Adapted from the Science Creative Quarterly (2)

Para

met

er 3

Parameter 4Pa

ram

eter

2

Parameter 1

Importance of FCM Data Clustering

• Manual Gating is– Subjective– Error-prone– Time-Consuming– It ignores the multi-variation nature of the data

• Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques

Which Clustering Algorithm Is Suitable?• Model-Based algorithms like FlowClust, FlowMerge and FLAME

are not suitable for non-elliptical shape clusters.

6

FlowMergeA Good Clustering

GFP

Our Motivation for Using Spectral Clustering

• Spectral clustering does not require any priori assumption on cluster size, shape or distribution

• It is not sensitive to outliers, noise and shape of clusters

7

Spectral Clustering in One SlideRepresent data sets by a similarity graph

Construct the Graph:• Vertices: data points p1, p2, …, pn

• Weights of edges: similarity values Si, j as

Clustering: Find a cut through the graph• Define a cut objective function• Solve it

The Bottleneck of Spectral Clustering

• Serious empirical barriers when applying this algorithm to large datasets

• Time complexity: O(n3) ---- > 2 years for 300,000 data points (cells)

• Required memory: O(n2) ---- > 5 terabytes for 300,000 data points (cells)

9

Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data

• Uniform Sampling:Low density populations close to dense ones may not remain distinguishable

10

• Faithful Sampling:Tends to choose more samples from non-dense parts of the data.

How Does Our Faithful SamplingPreserve Information?

1.1. Space Uniform Sampling: Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling.

2.2. Keeping the list of points in Keeping the list of points in neighbourhood of samples: neighbourhood of samples: This will be used to define similarities between communities.

Clustering Result• Low density populations surrounded by dense ones

Clustering Result• Populations with Non-elliptical Shapes

• Subpopulations of a major population

13

SamSPECTRAL flowMerge FLAME

Summary• Spectral clustering can now be applied to large size data

by our proposed Faithful (Information Preserving) sampling.

• This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data.

• We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of– Cell populations with non-elliptical shapes– Low-density populations surrounded by dense ones– Sub-populations of a major population

Acknowledgement• Committee:

– Dr. Arvind Gupta– Dr. Ryan Brinkman– Dr. Tobias Kollman

• Co-authors on SamSPECTRAL – Habil Zare

• Data Providers – Connie Eaves– Peter Landsdrop– Keith Humphries

Thanks for Thanks for Your Attention!Your Attention!

Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa...

Documents

Transcript of Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa...