IVFC Signal Denoising

44
Introduction Methods and Results Summary Cell Counting on In Vivo Flow Cytometry Time Series data Chaofeng Wang March 25, 2011 Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

description

ivfc slides to Honwai Leong.

Transcript of IVFC Signal Denoising

Page 1: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Cell Counting on In Vivo Flow CytometryTime Series data

Chaofeng Wang

March 25, 2011

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 2: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Abstract

In the presentation, I will introduce three methods for IVFC dataanalysis.Line-Separating Method is the conventional and earliest method.Wavelet-based peak picking is an adaptive method inspired fromaudio processingAnd statistical thresholding method uses Gaussian Mixture Modelto count cell automatically and consistently.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 3: IVFC Signal Denoising

IntroductionMethods and Results

Summary

In Vivo Flow Cytometry (IVFC)

Excited and detected at a same confocal plane.Output: Time Series data.1

1For IVFC settings, refer to [9].Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 4: IVFC Signal Denoising

IntroductionMethods and Results

Summary

In Vivo Flow Cytometry (IVFC)

Capabilities [9]:

I Real-time Cell Counting (v.s. Hemocytometer)

I Suitable for cells of high velocity and Low SNR signal(v.s.Confocal and 2-photon imaging) - 5 ∼ 100 kHz sampling rate.

I Monitoring cell kinetics in vivo (without blood extraction)

Most Applications are in Metastasis research [7, 13].

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 5: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Low SNR Reasons Inventory

1. Auto-fluorescence

2. Unspecific Labeling from incomplete cleansing

3. Labeled cells deviating from Confocal Plane

4. Non-uniform Staining

5. Instability of fluorescent dyes in long-time assaying

6. Labeled cells may aggregate. 2 out of 119 images of labeledcells are potentially clustered cells [8]

7. Instrumental noises and White noises

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 6: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Conventional Gating: Line Separating Method (LSM)

Line Separating Gui V2.0 Thresholds adjustable

Discrete FWHM is calculated, thus discreteness.

MATLAB scripts by chaofeng Wang.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 7: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Discrete FWHM

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 8: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Line Separating Method (LSM)

Gating Strategies

I Background assaying - control data

I Manual pickup of noise segments from experiment data

I Expert adjustment (subjectivity)

I Peak Height - Full Width at Half Maximum (FWHM) featurespace, Separating by a straight line (underfitting, Hyperbola,y = x−1 + a?)

2

2LSM is proposed on the invention of IVFC by Novak et al [10].Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 9: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Wavelet Based Peak Picking

Two Steps,

1. Wavelet Denoising.

2. Adaptive Peak Picking.

The work is contributed to David Damm, presented on BMEI 2009conference [4].

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 10: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Wavelet Denoising

Noise Model: recover an unknown function f on [0, 1] from noisydata

di = f (ti ) + σzi , i = 0, . . . , n − 1

where ti = in , zi is a standard Gaussian White Noise

(zi ∼ N(0, 1), i .i .d), and σ is a noise level.

Denoise Aim: Optimize the Mean Squared Error subject to thecondition that f̂ is at least as smooth as f with high probability.3

3Reference: [6, 5]Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 11: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Soft thresholding

Apply the soft thresholding nonlinearity coordinatewise to theempirical wavelet coefficients:

ηt(y) = sgn(y)(|y | − t)+

where (x)+ = 0 if x < 0; (x)+ = x if x ≥ 0. And t is speciallychosen threshold.

tn = γ1 × σ ×√

2log(n)/n

γ1 is a constant, which is set to 1 in simpler situations.For practical situations where σ is unknown, σ̂ = MAD/0.6745 isused.4

4Reference: [5]Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 12: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Adaptive Peak Picking

Finite State Automaton

In A1 and P1, accumulated discrete derivative is reset to 0.A peak is reported whenever stat D2 is reached.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 13: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Adaptive Peak Picking

Threshold baseline is calculated in a rolling window[t − l/2, t + l/2], on a fixed (even interger) window size l :

B(t) = Medianw + Stdw

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 14: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Wavelet Based Peak Picking

Matlab Wavelet toolbox is used for the research.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 15: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Wavelet Method in comparison to LSM

Table: Comparison of cell counts by wavelet method and LSM

Dataset LSM wavelet Consensus

1-1.dcf 80 162 791-2.dcf 71 153 702-1.dcf 30 42 132-2.dcf 41 59 203-1.dcf 175 175 1353-2.dcf 81 157 775-1.dcf 36 67 345-6.dcf 59 69 46

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 16: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Statistical Modeling for IVFC data peaks

Disadvantages of LSM:

I Subjective, labour-intensive - control is always needed toperform.

I Susceptible to outliers in control.

I Control losing thresholding power when long-time assayinglasting for days.

I Experts may give inconsistent thresholds.

We propose a thresholding method to

I achieve consisteny and robustness

I based on statistical modeling, providing a kind of ground truthfor other fast cell counting methods

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 17: IVFC Signal Denoising

IntroductionMethods and Results

Summary

The histogram of IVFC data

Skewed to the right.Lognormal? log(data) ∼ N (µ, σ)Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 18: IVFC Signal Denoising

IntroductionMethods and Results

Summary

The histogram of IVFC log(data)

All the values ≤ 0 are discarded.Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 19: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Automatic classifiers for Flow cytometry

Pyne et al: robust skew-t distribution mixture models, FLAME [12]

Chan et al: extracted biologically meaningful cell subsets bydefining putative cell subsets as groups of mixture components [2]

In machine learning category, Vector Quantization methods areused [3, 11].

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 20: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Statistical Thresholding Method (STM)

Assumptions:

I Noise peaks are majority and clustered well.

I Cell peaks are minority and outliers.

I All the peaks can be modeled into 2 or more GaussianMixture Components.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 21: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Gaussian Mixture Model (GMM)

Assume there are K groups in data, in GMM K componentsaccordingly.

p(x) =K∑

k=1

p(k)p(x |k)

p(x) =K∑

k=1

πkN (x |µk ,Σk)

where πk is the proportion of component k in whole data.5

5Reference: [1]Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 22: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Expectation Maximization for GMM

1. Expectation Step:

γ(i , k) =πkN (xi |µk ,Σk)∑Kj=1 πjN (xi |µj ,Σj)

where γ(i , k) is the prob that xi comes from component k .2. Likelihood Maximization Step:

µk =1

Nk

N∑i=1

γ(i , k)xi

Σk =1

Nk

N∑i=1

γ(i , k)(xi − µk)(xi − µk)T

where Nk =∑N

i=1 γ(i , k), and πk can be estimated as Nk/N.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 23: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Bayesian/Akaike Information Criterion (BIC, AIC)

AIC and BIC are criteria to decide which model is best to avoidoverfitting and underfitting,

AIC = 2k − 2ln(L)

BIC = k × ln(n)− 2ln(L)

where k is the number of parameters, and L is the maximizedlikelihood, n is sample size.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 24: IVFC Signal Denoising

IntroductionMethods and Results

Summary

BIC, AIC for k in GMM

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 25: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Thresholding strategy

3-GMM is chosen for IVFC data.

Cell peak component is too small and considered outliers. So thethreshold is set on the noise component with the largest µ.

Set threshold at µ2 + σ2 × a, where µ2 and σ2 is the mean andstandard deviation of the second component. a is called sigmafactor.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 26: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Sigma Factor Picker

The Picker aims to keep False Positive Number as small aspossible.

Sample number N µ+ aσ Φ(µ+ aσ) FPN for cell peaks

<= 1 a = 1 0.841344746069 N.A.

<= 100 a = 2 0.977249868052 <= 2.

<= 1000 a = 3 0.998650101968 <= 1.

<= 105 a = 4 0.999968328758 <= 3.

<= 107 a = 5 0.999999713348 <= 3.

<= 109 a = 6 0.999999999013 <= 1.

<= 1012 a = 7 0.999999999999 <= 1.

Table: Sigma Factor Picker

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 27: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Keep FPN low

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 28: IVFC Signal Denoising

IntroductionMethods and Results

Summary

STM procedures

I 1. Bring down the baseline to 0 and smooth.v = v− bv is the input data and b is the estimated baseline.

I 2. Shift-lessly filtering. vs = Convolve(v,GKern(lgk))GKern(lgk) is the Gaussian Kernel of length of lgk .

I 3. Get all the peaks (or say local maxima) of Vs , noted as p.They are cell peak candidates.

I 4. Use [0.75 0.95] quantile as bounds to generate initial guess,and use it to fit 3 gaussian mixture model to p. In descendingorder, they are D1, D2, D3.

I 5. t = D2.µ+ sf ∗D2.σ. Sigma factor sf is determined by thesample number of D2 according to the Sigma Factor PickerTable.

I 6. All the peaks in p higher than t are picked as cell peaks.

A Matlab Script for a Graphical User Interface of STM is available.Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 29: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Simulated data

I 100 gaussian-shape peaks (in blue) with height 1˜2, fwhm5˜9 evenly distributed in 10000 samples.

I Additive white gaussian noise with SNR = 1.

I Increasing baseline from 0 to 1.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 30: IVFC Signal Denoising

IntroductionMethods and Results

Summary

SNR Presure Tests on Simulated data

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 31: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Cell Peak Proportion Tests on Simulated data

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 32: IVFC Signal Denoising

IntroductionMethods and Results

Summary

STM on Control data

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 33: IVFC Signal Denoising

IntroductionMethods and Results

Summary

STM on Experiment data

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 34: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Real-time test on Experiment data

Used Data Thresholds Cell Counts

[0 100] 0.04727 572

[0 200] 0.04663 590

[0 300] 0.04558 615

[0 400] 0.04552 617

[0 500] 0.04510 626

[0 600] 0.04507 626

[0 700] 0.04522 620

[0 800] 0.04473 635

[0 900] 0.04450 642

whole data 0.04450 642

Table: Real-time test

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 35: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Consistency test on Experiment data

Sum counts on 100 seconds segments, and compare to the resultof integral counting.

Used Data Summed Integral LSM

0-15 m1 652 641 295

15-30 m1 415 395 208

1h m1 229 225 NAN

72h m1 225 221 NAN

0-15 m2 621 614 68

45-60 m2 309 304 55

1h m2 196 200 41

0-15 m3 267 268 N.

30-45 m3 198 197 N.

1h m3 107 106 N.

Table: Consistency testChaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 36: IVFC Signal Denoising

IntroductionMethods and Results

Summary

LSM, LSMsd, STM

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 37: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Summary

As for Non-stationary time-series data processing for IVFC,GMM-based thresholding provides a consistent method for cellcounting. Other statistical models and pattern recognitionmethods might also be useful.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 38: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Acknowlegements To

Collaborators for hard work and inspirations:Jin Guo, IPSGuangda Liu, IPSXiaoying Tan, IPSProf. Xunbin Wei, IPS

Visitors for guidance on Signal processing and Statistics:David Damm, past in Bonn UniversityKeli Huang, Past in Bonn University

Prof. Axel Mosig, and all members from the group for all kinds ofsupport.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 39: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Bibliography I

C.M. Bishop and SpringerLink (Online service).Pattern recognition and machine learning, volume 4.Springer New York:, 2006.

Cliburn Chan, Feng Feng, Janet Ottinger, David Foster, MikeWest, and Thomas B. Kepler.Statistical mixture modeling for cell subtype identification inflow cytometry.Cytometry, 73A(8):693–701, 2008.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 40: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Bibliography II

ES Costa, ME Arroyo, CE Pedreira, MA Garcia-Marcos,MD Tabernero, J. Almeida, and A. Orfao.A new automated flow cytometry data analysis approach forthe diagnostic screening of neoplastic b-cell disorders inperipheral blood samples with absolute lymphocytosis.Leukemia, 20(7):1221–1230, 2006.

D. Damm, C. Wang, X. Wei, and A. Mosig.Cell counting for in vivo flow cytometer signals usingwavelet-based dynamic peak picking.In Biomedical Engineering and Informatics, 2009. BMEI’09.2nd International Conference on, pages 1–4. IEEE, 2009.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 41: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Bibliography III

D. L. Donoho.De-noising by soft-thresholding.IEEE Trans. Inform. Theory, 41(3):613–627, May 1995.

DAVID L. Donoho and JAIN M. Johnstone.Ideal spatial adaptation by wavelet shrinkage.Biometrika, 81(3):425–455, 1994.

Irene Georgakoudi, Nicolas Solban, John Novak, William L.Rice, Xunbin Wei, Tayyaba Hasan, and Charles P. Lin.In vivo flow cytometry.Cancer Research, 64(15):5044–5047, 2004.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 42: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Bibliography IV

Ho Lee, Clemens Alt, Costas M. Pitsillides, MehronPuoris’haag, and Charles P. Lin.In vivo imaging flow cytometer.Opt. Express, 14(17):7789–7800, Aug 2006.

J. Novak, I. Georgakoudi, X. Wei, A. Prossin, and CP Lin.In vivo flow cytometer for real-time detection andquantification of circulating cells.Optics letters, 29(1):77–79, 2004.

John Novak.Development of the in vivo flow cytometer.PhD thesis, Massachusetts Institute of Technology, Boston,MA, 2004.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 43: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Bibliography V

C.E. Pedreira, E.S. Costa, M.E. Arroyo, J. Almeida, andA. Orfao.A multidimensional classification approach for the automatedanalysis of flow cytometry data.Biomedical Engineering, IEEE Transactions on,55(3):1155–1162, 2008.

Saumyadipta Pyne, Xinli Hu, Kui Wang, Elizabeth Rossin,Tsung-I Lin, Lisa M. Maier, Clare Baecher-Allan, Geoffrey J.McLachlan, Pablo Tamayo, David A. Hafler, Philip L.De Jager, and Jill P. Mesirov.Automated high-dimensional flow cytometric data analysis.Proceedings of the National Academy of Sciences,106(21):8519–8524, May 2009.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

Page 44: IVFC Signal Denoising

IntroductionMethods and Results

Summary

Bibliography VI

X. Wei, D.A. Sipkins, C.M. Pitsillides, J. Novak,I. Georgakoudi, and C.P. Lin.Real-time detection of circulating apoptotic cells by in vivoflow cytometry.Molecular imaging: official journal of the Society for MolecularImaging, 4(4):415, 2005.

Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data