Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa...
-
Upload
elmer-houston -
Category
Documents
-
view
215 -
download
0
Transcript of Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa...
Faithful Sampling for Spectral Clustering to Analyze High
Throughput Flow Cytometry Data
Parisa Shooshtari
School of Computing Science, Simon Fraser University, Burnaby
Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver
Outline:
• Flow Cytometry (FCM) Data• Clustering of FCM data• Spectral Clustering• Faithful Sampling for Spectral Clustering• Result• Summary
Basics of Flow Cytometry Technique
3
Sample
Wave Length
Wave Length
Inte
nsity
Inte
nsity
MHC-II
MHC-II
MHC-II
MHC-II
CD-11c
CD-11c
Int-1
Int-2
CD-11c
MHC-IIInt-1Int-2
Cell Population Identification in Flow Cytometry (FCM)
X%
Adapted from the Science Creative Quarterly (2)
Para
met
er 3
Parameter 4Pa
ram
eter
2
Parameter 1
Importance of FCM Data Clustering
• Manual Gating is– Subjective– Error-prone– Time-Consuming– It ignores the multi-variation nature of the data
• Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques
Which Clustering Algorithm Is Suitable?• Model-Based algorithms like FlowClust, FlowMerge and FLAME
are not suitable for non-elliptical shape clusters.
6
FlowMergeA Good Clustering
GFP
Our Motivation for Using Spectral Clustering
• Spectral clustering does not require any priori assumption on cluster size, shape or distribution
• It is not sensitive to outliers, noise and shape of clusters
7
Spectral Clustering in One SlideRepresent data sets by a similarity graph
Construct the Graph:• Vertices: data points p1, p2, …, pn
• Weights of edges: similarity values Si, j as
Clustering: Find a cut through the graph• Define a cut objective function• Solve it
The Bottleneck of Spectral Clustering
• Serious empirical barriers when applying this algorithm to large datasets
• Time complexity: O(n3) ---- > 2 years for 300,000 data points (cells)
• Required memory: O(n2) ---- > 5 terabytes for 300,000 data points (cells)
9
Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data
• Uniform Sampling:Low density populations close to dense ones may not remain distinguishable
10
• Faithful Sampling:Tends to choose more samples from non-dense parts of the data.
How Does Our Faithful SamplingPreserve Information?
1.1. Space Uniform Sampling: Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling.
2.2. Keeping the list of points in Keeping the list of points in neighbourhood of samples: neighbourhood of samples: This will be used to define similarities between communities.
Clustering Result• Low density populations surrounded by dense ones
Clustering Result• Populations with Non-elliptical Shapes
• Subpopulations of a major population
13
SamSPECTRAL flowMerge FLAME
Summary• Spectral clustering can now be applied to large size data
by our proposed Faithful (Information Preserving) sampling.
• This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data.
• We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of– Cell populations with non-elliptical shapes– Low-density populations surrounded by dense ones– Sub-populations of a major population
Acknowledgement• Committee:
– Dr. Arvind Gupta– Dr. Ryan Brinkman– Dr. Tobias Kollman
• Co-authors on SamSPECTRAL – Habil Zare
• Data Providers – Connie Eaves– Peter Landsdrop– Keith Humphries
Thanks for Thanks for Your Attention!Your Attention!