Making Sense out of Flow Cytometry Data Overload
description
Transcript of Making Sense out of Flow Cytometry Data Overload
![Page 1: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/1.jpg)
© 2010 by University of Pennsylvania School of Medicine
Making Sense out of Flow Cytometry Data Overload
A crash course in R/Bioconductor and flow cytometry fingerprinting
![Page 2: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/2.jpg)
Outline
• Background R Bioconductor
• Motivating examples• Starting R, entering commands• How to get help• R fundamentals
Sequences and Repeats Characters and Numbers Vectors and Matrices Data Frames and Lists Importing data from spreadsheets
• flowCore Loading flow cytometry (FCS) data gating compensation transformation visualization
• flowFP Binning Fingerprinting Comparing multivariate distributions
• Writing your own functions• Installing and running R on your
computer• Suggestions for further reading
and reference
![Page 3: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/3.jpg)
Background
• R Is an integrated suite of software facilities for data manipulation,
simulation, calculation and graphical display. It handles and analyzes data very effectively and it contains a suite of
operators for calculations on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs
and data displays. It is an elegant, object-oriented programming language. Started by Robert Gentleman and Ross Ihaka (hence “R”) in 1995
as a free, independent, open-source implementation of the S programming language (now part of Spotfire)
Currently, maintained by the R Core development team – an international group of hard-working volunteer developers
http://www.r-project.org
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
![Page 4: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/4.jpg)
Background
• Bioconductor “Is an open source and open development software project to provide
tools for the analysis and comprehension of genomic data.” Goals
To provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data.
To provide a common software platform that enables the rapid development and deployment of extensible, scalable, and interoperable software.
To further scientific understanding by producing high-quality documentation and reproducible research.
To train researchers on computational and statistical methods for the analysis of genomic data.
http://bioconductor.org/overview
![Page 5: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/5.jpg)
A motivating example
I’ve just collected data from a T cell stimulation experiment in a 96-well plate format. I need to gate the data on CD3/CD4. How consistent are the distributions, so that I can establish one set of gates for the whole plate?
![Page 6: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/6.jpg)
A motivating example
![Page 7: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/7.jpg)
Another motivating example
I’m concerned that drawing gates to analyze my data introduces unintended bias. Additionally, since I have multiple data files, drawing multiple gates is time consuming. Can I use R to compute gates and then apply these same objective gating criteria to multiple data files?
![Page 8: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/8.jpg)
Another motivating example
Autogate lymphocytesand monocytes
Automatically analyzeFMO tubes
![Page 9: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/9.jpg)
Back to the basics
• R is a command-line driven program
the prompt is: > you type a command
(shown in blue), and R executes the command and gives the answer (shown in black)
![Page 10: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/10.jpg)
Simple example: enter a set of measurements
• use the function c() to combine terms together• Create a variable named mfi• Put the result of c() into mfi using the
assignment operator <- (you can also use =)• The [1] indicates that the result is a vector
![Page 11: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/11.jpg)
Help, functions, polymorphism
> help (log)
> ?log
> apropos(“log”)
![Page 12: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/12.jpg)
Vignettes – really good help!
![Page 13: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/13.jpg)
Sequences and Repeats
![Page 14: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/14.jpg)
Characters and Numbers
• Characters and character strings are enclosed in “” or ‘’
• Special numbers• NA – “Not Available”• Inf – “Infinity”• NaN – “Not a Number”
![Page 15: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/15.jpg)
Vectors and Matrices
![Page 16: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/16.jpg)
Vectors and Matrices
• The subset operator for vectors and matrices is [ ]
![Page 17: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/17.jpg)
Vectors and Matrices
• You can extend the length of a vector via subsetting
… but not a matrix
![Page 18: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/18.jpg)
Vectors and Matrices
• However, all’s not lost if you want to extend either the columns …
… or rows
![Page 19: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/19.jpg)
Data Frames
• A Data Frame is like a matrix, except that the data type in each column need not be the same
Often, a Data Frame is created from an Excel spreadsheet using the function read.table()
Save As…a tab-delimitedtext file.
![Page 20: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/20.jpg)
Data Frames from spreadsheets
![Page 21: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/21.jpg)
Data Frames from spreadsheets
![Page 22: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/22.jpg)
Data Frames from spreadsheets
![Page 23: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/23.jpg)
Lists
![Page 24: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/24.jpg)
Handling Flow Cytometry Data: flowCore
• flowCore is a base package that supports reading and manipulation of FCS data files
• The fundamental object that encapsulates the data in an FCS file is a flowFrame
• A container object that holds a collection of flowFrames is called a flowSet
• In the next slides we will go over reading an FCS file gating compensation transformation visualization
![Page 25: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/25.jpg)
Check out the example data
![Page 26: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/26.jpg)
Read an FCS file, summarize the flowFrame
![Page 27: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/27.jpg)
![Page 28: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/28.jpg)
![Page 29: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/29.jpg)
![Page 30: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/30.jpg)
Apply the lymphocyte gate with Subset
![Page 31: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/31.jpg)
needs to be transformed becauseit is rendering the linear datain the FCS file
![Page 32: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/32.jpg)
hasn’t been compensated!
![Page 33: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/33.jpg)
![Page 34: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/34.jpg)
• Lines require library(fields)
• Percentages are in summary(fres)$p[1:4]
• Percentages are drawn in the graph with text()
![Page 35: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/35.jpg)
Fingerprinting Flow Cytometry Data: flowFP
• flowFP aims to transform flow cytometric data into a form amenable to
algorithmic analysis tools Acts as in intermediate step between acquisition of high-throughput
FCM data and empirical modeling, machine learning and knowledge discovery
Implements ideas from
Roederer M, Moore W, Treister A, Hardy RR & Herzenberg LA. Probability binning comparison: a metric for quantitating multivariate distribution differences. Cytometry 45:47-55, 2001.
Rogers WT, Moser AR, Holyst HA, Bantly A, Mohler ER III, Scangas G, and Moore JS, Cytometric Fingerprinting: Quantitative Characterization of Multivariate Distributions, Cytometry 73A: 430-441, 2008.
and
![Page 36: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/36.jpg)
The basic idea
• Subdivide multivariate space into bins Call this a “model” of the space
• For each flowFrame in a flowSet, count the number of events in
each bin in the model• Flatten the collection of counts for a flowFrame into a 1D feature
vector• Combine all of the feature vectors together into a n x m matrix
n = number of flowFrames (instances) m = number of bins in the model (features)
• Also, tag each event with its bin membership facilitates visualization, interpretation can be used for gating
![Page 37: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/37.jpg)
Probability Binning
![Page 38: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/38.jpg)
Probability Binning
![Page 39: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/39.jpg)
Probability Binning
![Page 40: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/40.jpg)
Probability Binning
Bin
Nu
mb
er
> plot (mod, fs)
![Page 41: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/41.jpg)
Class Constructors
• flowFPModel (base class) Consumes a flowFrame or flowSet Produces a model, which is a recipe for subdividing multivariate space
• flowFP Consumes a flowFrame or flowSet, and a flowFPModel Produces a flowFP, which represents the multivariate probability density
function as a fingerprint Also tags each event with its bin membership
• flowFPPlex Consumes a collection of flowFPs The flowFPPlex is a container object to facilitate handling large and
complex collections of flowFPs
![Page 42: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/42.jpg)
![Page 43: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/43.jpg)
![Page 44: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/44.jpg)
![Page 45: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/45.jpg)
![Page 46: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/46.jpg)
![Page 47: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/47.jpg)
Writing Your Own Functions
commentscomments
declarationdeclaration
assignmentassignment
returnreturn
code blockcode block
## It’s a good idea to comment your code#
myfunc <- function (arg1=10, arg2, ...){
# your code goes hereanswer <- log (arg1, base=arg2)
return (answer)}
![Page 48: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/48.jpg)
Writing Your Own Functions
![Page 49: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/49.jpg)
![Page 50: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/50.jpg)
Obtaining R and Bioconductor
• R http://cran.r-project.org/
• Bioconductor http://bioconductor.org/GettingStarted
![Page 51: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/51.jpg)
General Reference Material
• A good beginner’s guide to R http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
• A nice one-page reference card http://cran.r-project.org/doc/contrib/Short-refcard.pdf
• Outstanding summary of R/Bioconductor, with many examples http://manuals.bioinformatics.ucr.edu/home/
R_BioCondManual#R_favorite • The definitive reference for writing R extensions (advanced!)
http://cran.r-project.org/doc/manuals/R-exts.pdf• Books
William N. Venables and Brian D. Ripley. Modern Applied Statistics with S. Fourth Edition. Springer, New York, 2002. ISBN 0-387-95457-0.
John M. Chambers. Programming with Data. Springer, New York, 1998. ISBN 0-387-98503-4 (aka “the Green Book”)
![Page 52: Making Sense out of Flow Cytometry Data Overload](https://reader030.fdocuments.in/reader030/viewer/2022033101/56815ac8550346895dc89845/html5/thumbnails/52.jpg)
Flow-Specific References
• Vignettes http://bioconductor.org/packages/2.6/bioc/vignettes/flowCore/inst/doc/HowTo-flowCore.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowViz/inst/doc/filters.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowStats/inst/doc/
GettingStartedWithFlowStats.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowQ/inst/doc/
DataQualityAssessment.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowFP/inst/doc/flowFP_HowTo.pdf
• Original Articles flowCore
Hahne, F., N. LeMeur, et al. (2009). "flowCore: a Bioconductor package for high throughput flow cytometry." BMC Bioinformatics 10: 106.
Fingerprinting Rogers, W. T., A. R. Moser, et al. (2008). "Cytometric fingerprinting: quantitative
characterization of multivariate distributions." Cytometry A 73(5): 430-41. Rogers, W. T. and H. A. Holyst (2009). "flowFP: A Bioconductor Package for
Fingerprinting Flow Cytometric Data." Advances in Bioinformatics 2009(Article ID 193947): 11.