Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of...

24
Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide Risso

Transcript of Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of...

Page 1: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Analysisofsingle-cellRNA-seqdatawithRandBioconductor.

FannyPerraudeau,KellyStreet,DavideRisso

Page 2: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Tasksandpackages

•  DimensionalityreducDonwithzinbwave.– bioconductor.org/packages/zinbwave

•  ClusteranalysiswithclusterExperiment.– bioconductor.org/packages/clusterExperiment

•  Lineageinferenceandtrajectoryanalysiswithslingshot.– github.com/kstreet13/slingshot

Workshopmaterial:github.com/fperraudeau/bioc2017singlecell

Page 3: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Acknowledgements UC Berkeley

Methods development Sandrine Dudoit

Elizabeth Purdom Nir Yosef

Svetlana Gribkova JP Vert

Experiments and analyses Russell Fletcher

Diya Das John Ngai Lab

Core Facilities QB3 Functional Genomics Laboratory

QB3 Genomics Sequencing Laboratory Cancer Research Laboratory

Funding

NIH BRAIN Initiative Cell Census Consortium

Page 4: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Adult stem cells maintain and regenerate tissues

Bond, Ming, Song (2015) Cell Stem Cell

Page 5: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Fletcher et al. (2017) Cell Stem Cell

Page 6: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Usefullinks

•  ZINB-WaVEpreprint:– hYps://doi.org/10.1101/125112

•  Slingshotpreprint:– hYps://doi.org/10.1101/128843

•  Pleasesubmitbugreportsandotherissuesat:– github.com/drisso/zinbwave/issues– github.com/epurdom/clusterExperiment/issues– github.com/kstreet13/slingshot/issues

Page 7: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Contactus!

•  E-mail–  [email protected]– [email protected]–  [email protected]

•  TwiYer:– @fannyperraudeau– @KStreetBerkeley– @drisso1893

Page 8: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

RSEC:Resampling-basedSequenDalEnsembleClustering

•  UseaclusteringrouDnethatfindsalargenumberofsmall,coherentclusters:–  Subsamplingofdatatofindrobustclusters–  SequenDalclusteringàfindagroupofcoherentsamples,removethem,startover

•  PerformthisrouDneovermanydifferentparameters•  Findasingleconsensusovertheclusterings•  Mergetogethernon-differenDalclusters•  FindbiomarkersviadifferenDalexpressionwithtargetedcomparisons

•  ImplementedwithvisualizaDontoolsinclusterExperiment package

Page 9: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Whatmeanbysubsampleclustering?

•  Pickanunderlyingclusteringstrategy(e.g.kmeansorPAMwithparDcularchoiceofK)

•  Repeatthefollowing:– Subsamplethedata,e.g.70%ofsamples– Findclustersonthesubsample

•  CreatecoClusteringmatrixD:%ofsubsampleswheresampleswereinsamecluster

Page 10: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

ExamplesofmatrixD

Note,hereforcedthesamplesinordergivenbyPAMAlsousedkmeansinresampling,ratherthanPAM

K=3 K=7

Page 11: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Whatmeanbysubsampleclustering?

•  Pickanunderlyingclusteringstrategy(e.g.kmeansorPAMwithparDcularchoiceofK)

•  Repeatthefollowing:–  Subsamplethedata,e.g.70%ofsamples–  Findclustersonthesubsample

•  CreatecoClusteringmatrixD:%ofsubsampleswheresampleswereinsamecluster

•  ClustermatrixDforfinalclustering–  Couldbewithoriginalclusteringstrategy,ordifferentone–  “Right”KmaynotbetheKusedinoriginalclustering(e.g.kmeans)

–  Weuseamoreflexibleapproachofhierarchicalclusteringandpickingclusterssohaveatleast1-αsimilarity

–  ChangefrompickingKtopickingα,moreintuiDvechoice–  Notallsamplesgetclustered

Page 12: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

WhatmeanbysequenDalclustering?

•  OverrangeofstarDngparameters,doclustering•  Theclusterthatstaysatleastβsimilar,idenDfyascluster

andremove•  RepeatunDlnomoresuchclustersfound,ornotenough

sampleslen•  Drawsonideasof“Dghtclustering”ofgenesofTsengand

Wong(2005)

Specifically,•  WerangeoverKinunderlyingPAMinsubsampling•  Wefindclustersbasedonresultsofsubsampleclustering

(somaynotbesameKasinputparameter)

Page 13: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Becauseofsubsampling,changingKismoreaperturbaDon

K=4 K=5

K=6 K=7

Page 14: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Findconsensuscluster

•  Createaco-ClusteringmatrixDofhowmanyDmesco-clustertogetherandclusterD– Likewithsubsampling,onlynowacrossdifferentparameters

– Again,importantthatclustersarelargelyperturbaDons,notradicallydifferentclusters

Page 15: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Bioconductorworkflowforsingle-cellRNA-seqdataanalysis:

dimensionalityreduc<on,clustering,andpseudo<meordering.

FannyPerraudeau,DavideRisso,KellyStreetBioC2017

July28th,2017

Page 16: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

Clustereddatainlow-dimensionalspace.Weconsiderdimensionalityreduc<onandclusteringtobeseparateproblems,butgenerallypreferPCAandRSEC.

2

Step0:Input

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

Page 17: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ● ●

3

Step1:ClusterMST

Constructminimumspanningtree(MST)onclusters.Thenodesareclusters,notclustercenters.Requiresadistancemetric.

Page 18: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ● ●

4

Step1:ClusterMST

Constructminimumspanningtree(MST)onclusters.Selectstar<ngclusterbasedonmarkergenesorparsimony.

Page 19: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ● ●

5

Step1:ClusterMST

Specifyknownterminalclustersforaddi<onalsupervision.Thisresultsintheconstruc<onofaconstrainedMST.

Page 20: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ● ●

6

Step1:ClusterMST

Constructminimumspanningtree(MST)onclusters.Selectstar<ngclusterbasedonmarkergenesorparsimony.

Page 21: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

7

Step1.5:PrincipalCurves

Highlystable,nonlineargeneraliza<onofprincipalcomponents.Fitsacurvetothe“middle”ofthedata.Inconsistent,failstoreflectunderlyingbiology(ie.branching).

Page 22: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

8

Step2:SimultaneousPrincipalCurves

S<llhighlystableandnonlinear.Extendstheconceptofprincipalcurvestomul<ple,branchingcurveswithcommonorigin(smoothtreestructures).

Page 23: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●

9

Step2:SimultaneousPrincipalCurves

S<llhighlystableandnonlinear.Extendstheconceptofprincipalcurvestomul<ple,branchingcurveswithcommonorigin(smoothtreestructures).

Page 24: Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of single-cell RNA-seq data with R and Bioconductor. Fanny Perraudeau, Kelly Street, Davide

10

Step2:SimultaneousPrincipalCurves

Projectcellsontocurvestoobtainpseudo<mevalues.Likelinearprincipalcomponents,thecurvesseektominimizesquaredprojec<ondistance(subjecttosomeconstraints).

●●

●● ●

●●

●●●

● ●●

●●

●●●

●● ●●●

●●

●●●

●●

● ●●●

●●●●

●●

●● ●

●●

●●● ●

●●

●●

●●

●● ●● ●

●● ●●

●●

●●● ●

●●●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

● ●●

●● ●

●●