MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of...

13
RESEARCH ARTICLE SUMMARY MALARIA The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle Virginia M. Howick*, Andrew J. C. Russell*, Tallulah Andrews, Haynes Heaton, Adam J. Reid, Kedar Natarajan, Hellen Butungi, Tom Metcalf, Lisa H. Verzier, Julian C. Rayner, Matthew Berriman, Jeremy K. Herren, Oliver Billker, Martin Hemberg, Arthur M. Talman, Mara K. N. Lawniczak†‡ INTRODUCTION: Plasmodium parasites, the causative agent of malaria, are single-celled organisms with distinct morphological devel- opmental stages each specialized to inhabit vastly different environments and host cell types. Underlying this morphological diver- sity is tight regulation of a compact genome, where the functions of ~40% of genes remain unknown, hampering the rate of effective drug and vaccine development. Single-cell RNA sequencing (scRNA-seq) has allowed high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell variation, and its application to unicellular organisms reveals individual-level variation between parasites across their full life cycle. RATIONALE: We have assembled a Malaria Cell Atlas that presents the transcriptomic profiles of individual Plasmodium parasites across all morphological life cycle stages. The ambition of such an atlas is to (i) inform gene function and usage throughout the life cycle, (ii) understand the gene regulatory mecha- nisms underlying developmental transitions, (iii) discover parasite bet-hedging patterns, and (iv) provide a reference dataset that can be used to understand parasite biology by the malaria community in both lab and natural infections for multiple Plasmodium species. RESULTS: We isolated 1787 parasites using cell sorting and profiled full-length transcriptomes at 10 time points covering all life cycle stages across both the vector mosquito and the mam- malian host. From these data, we could un- derstand fine-scale transcriptional patterns of development and identify marker genes as- sociated with parasite stage, cellular strategy (replicative, growth, and sexual phases), and host environment. Comparing single-cell gene expression patterns across the life cycle re- vealed groups of genes expressed in similar patterns during development. The resulting clus- ters of genes that behave similarly enables in- ference of possible function for the ~40% of genes that remain uncharacterized. Using drop- let sequencing, we sequenced a further 15,858 cells from the intraerythrocytic developmental cycle for three different species, including two human pathogens. We aligned developmental trajectories across species during the pathogenic phase of the life cycle, es- tablishing a cross-species comparison method. Final- ly, we developed a protocol for preserving wild para- sites collected from naturally infected car- riers and used scRNA-seq, together with the Malaria Cell Atlas as a reference, to identify wild parasite developmental stages and char- acterize a natural mixed-species infection at single-cell resolution. CONCLUSION: We generated transcriptomes for all life cycle stages of Plasmodium and re- leased these via the interactive Malaria Cell Atlas website, www.sanger.ac.uk/science/tools/ mca/mca/. The Malaria Cell Atlas provides new insights into gene function and parasite devel- opmental progression. We have demonstrated that it can serve as a transcriptomic reference, facilitating the interpretation of data from mul- tiple species and multiple technologies. The characterization of wild Plasmodium parasites with immense genetic diversity will advance the study of the pathology and transmission of ma- laria directly from infected carriers. We envision that the Malaria Cell Atlas will support the development of much-needed new drugs, vac- cines, and transmission-blocking strategies. RESEARCH Howick et al., Science 365, 774 (2019) 23 August 2019 1 of 1 The list of author affiliations is available in the full article online. *These authors contributed equally to this work. These authors contributed equally to this work. Corresponding author. Email: [email protected] Cite this article as V. M. Howick et al., Science 365, eaaw2619 (2019). DOI: 10.1126/science.aaw2619 P. knowlesi P. falciparum P. malariae P. berghei Oocyst Ookinete/oocyst Bolus ookinete Merozoite Trophozoite Liver stage Ring Schizont Gland sporozoite Injected sporozoite Female gametocyte Male gametocyte UMAP1 UMAP2 PC3 PC2 PC1 PC1 PC2 PC1 PC2 PC1 PC2 In vivo model system In vitro culture Clinical isolates P. vivax P. ovale wallikeri P. ovale curtisi Single-cell RNA-seq references for the Plasmodium genus. Left: Single-cell transcriptomes from across the life cycle of Plasmodium berghei were profiled (including liver, blood, and mosquito life stages). Center: Deep exploration of blood-stage parasites captured transcriptomic diversity at single-cell resolution across three different Plasmodium species by droplet sequencing. Right: Such datasets can serve as references to understand wild parasites isolated from clinical samples. ON OUR WEBSITE Read the full article at http://dx.doi. org/10.1126/ science.aaw2619 .................................................. on November 29, 2020 http://science.sciencemag.org/ Downloaded from

Transcript of MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of...

Page 1: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

RESEARCH ARTICLE SUMMARY◥

MALARIA

The Malaria Cell Atlas: Single parasitetranscriptomes across the completePlasmodium life cycleVirginia M. Howick*, Andrew J. C. Russell*, Tallulah Andrews, Haynes Heaton,Adam J. Reid, Kedar Natarajan, Hellen Butungi, Tom Metcalf, Lisa H. Verzier,Julian C. Rayner, Matthew Berriman, Jeremy K. Herren, Oliver Billker,Martin Hemberg, Arthur M. Talman†, Mara K. N. Lawniczak†‡

INTRODUCTION: Plasmodium parasites, thecausative agent of malaria, are single-celledorganisms with distinct morphological devel-opmental stages each specialized to inhabitvastly different environments and host celltypes. Underlying this morphological diver-sity is tight regulation of a compact genome,where the functions of ~40% of genes remainunknown, hampering the rate of effectivedrug and vaccine development. Single-cellRNA sequencing (scRNA-seq) has allowedhigh-resolution mapping of developmentalprocesses, cellular diversity, and cell-to-cellvariation, and its application to unicellularorganisms reveals individual-level variationbetween parasites across their full life cycle.

RATIONALE:We have assembled a MalariaCell Atlas that presents the transcriptomicprofiles of individual Plasmodium parasitesacross all morphological life cycle stages. Theambition of such an atlas is to (i) inform genefunction and usage throughout the life cycle,

(ii) understand the gene regulatory mecha-nisms underlying developmental transitions,(iii) discover parasite bet-hedging patterns,and (iv) provide a reference dataset that canbe used to understand parasite biology by themalaria community in both lab and naturalinfections for multiple Plasmodium species.

RESULTS:We isolated 1787 parasites using cellsorting and profiled full-length transcriptomesat 10 time points covering all life cycle stagesacross both the vector mosquito and the mam-malian host. From these data, we could un-derstand fine-scale transcriptional patterns ofdevelopment and identify marker genes as-sociated with parasite stage, cellular strategy(replicative, growth, and sexual phases), andhost environment. Comparing single-cell geneexpression patterns across the life cycle re-vealed groups of genes expressed in similarpatterns during development. The resulting clus-ters of genes that behave similarly enables in-ference of possible function for the ~40% of

genes that remain uncharacterized. Using drop-let sequencing, we sequenced a further 15,858cells from the intraerythrocytic developmentalcycle for three different species, including twohuman pathogens. We aligned developmental

trajectories across speciesduring the pathogenicphase of the life cycle, es-tablishing a cross-speciescomparison method. Final-ly, we developed a protocolfor preserving wild para-

sites collected from naturally infected car-riers and used scRNA-seq, together with theMalaria Cell Atlas as a reference, to identifywild parasite developmental stages and char-acterize a natural mixed-species infection atsingle-cell resolution.

CONCLUSION:We generated transcriptomesfor all life cycle stages of Plasmodium and re-leased these via the interactive Malaria CellAtlas website, www.sanger.ac.uk/science/tools/mca/mca/. The Malaria Cell Atlas provides newinsights into gene function and parasite devel-opmental progression. We have demonstratedthat it can serve as a transcriptomic reference,facilitating the interpretation of data from mul-tiple species and multiple technologies. Thecharacterization of wild Plasmodium parasiteswith immense genetic diversity will advance thestudy of the pathology and transmission of ma-laria directly from infected carriers. We envisionthat the Malaria Cell Atlas will support thedevelopment of much-needed new drugs, vac-cines, and transmission-blocking strategies.▪

RESEARCH

Howick et al., Science 365, 774 (2019) 23 August 2019 1 of 1

The list of author affiliations is available in the full article online.*These authors contributed equally to this work.†These authors contributed equally to this work.‡Corresponding author. Email: [email protected] this article as V. M. Howick et al., Science 365,eaaw2619 (2019). DOI: 10.1126/science.aaw2619

P. knowlesi

P. falciparum

P. malariae

P. bergheiOocyst

Ookinete/oocyst

Bolus ookinete

Merozoite

Trophozoite

Liver stage

RingSchizont Gland sporozoite

Injected sporozoite

Femalegametocyte

Malegametocyte

UMAP1

UMAP

2

0

PC3

PC2

PC1

PC1

PC2

PC1

PC2

PC1

PC2

In vivo model system In vitro culture Clinical isolates

P. vivax

P. ovale wallikeriP. ovale curtisi

Single-cell RNA-seq references for the Plasmodium genus. Left: Single-cell transcriptomes from across the life cycle of Plasmodium bergheiwere profiled (including liver, blood, and mosquito life stages). Center: Deep exploration of blood-stage parasites captured transcriptomicdiversity at single-cell resolution across three different Plasmodium species by droplet sequencing. Right: Such datasets can serve as referencesto understand wild parasites isolated from clinical samples.

ON OUR WEBSITE◥

Read the full articleat http://dx.doi.org/10.1126/science.aaw2619..................................................

on Novem

ber 29, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 2: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

RESEARCH ARTICLE◥

MALARIA

The Malaria Cell Atlas: Single parasitetranscriptomes across the completePlasmodium life cycleVirginia M. Howick1*, Andrew J. C. Russell1*, Tallulah Andrews1, Haynes Heaton1,Adam J. Reid1, Kedar Natarajan2, Hellen Butungi3,4, Tom Metcalf1, Lisa H. Verzier1,5,Julian C. Rayner1, Matthew Berriman1, Jeremy K. Herren3,4,6, Oliver Billker1,7,Martin Hemberg1, Arthur M. Talman1,8†, Mara K. N. Lawniczak1†‡

Malaria parasites adopt a remarkable variety of morphological life stages as they transitionthrough multiple mammalian host and mosquito vector environments. We profiled thesingle-cell transcriptomes of thousands of individual parasites, deriving the first high-resolution transcriptional atlas of the entire Plasmodium berghei life cycle. We then usedour atlas to precisely define developmental stages of single cells from three differenthuman malaria parasite species, including parasites isolated directly from infectedindividuals. The Malaria Cell Atlas provides both a comprehensive view of gene usage in aeukaryotic parasite and an open-access reference dataset for the study of malaria parasites.

Single-cell RNA sequencing (scRNA-seq)is revolutionizing our understanding ofheterogeneous cell populations, revealingrare cell types, unraveling developmentalprocesses, and enabling greater resolution

of gene expression patterns than has previouslybeen possible (1). The ambition of cataloging thecomplete cellular composition of an animal isalready becoming reality (2, 3), but thus far,atlasing efforts have focused on multicellularorganisms. Here, we present the first compre-hensive cell atlas of a unicellular eukaryote,the malaria parasite, across the entirety of itslife cycle.Although malaria parasites are unicellular,

they display remarkable cellular plasticity dur-ing their complex life cycle, with stages rangingfrom 1.2 to 50 mm and spanning vastly differenthuman andmosquito environments. Clinical symp-toms of malaria result from asexual replication

within red blood cells, whereas transmission tonew hosts relies on replication in the mosquito.Both disease development and transmission aretherefore underpinned by the parasite’s abilityto serially differentiate into morphologically dis-tinct forms, including invasive, replicative, andsexual stages (Fig. 1A). This versatility is orches-trated by tight regulation of a compact genome,where the function of ~40% of genes remainsunknown (4). Better understanding of gene useand gene function throughout the parasite’s lifecycle is needed to inform the development ofmuch-needed new drugs, vaccines, and transmission-blocking strategies.

Single-cell resolution transcriptionalvariation provides insights intogene usage

To begin to build the Malaria Cell Atlas, weprofiled 1787 single-cell transcriptomes acrossthe entire life cycle of Plasmodium berghei usinga modified Smart-seq2 approach (5). Purificationmethods were adapted to isolate each stage ofthe life cycle, including challenging samplessuch as rings, which have low levels of RNA, andookinetes, which are difficult to sort (fig. S1).Ninety percent of sequenced cells passed qualitycontrol (1787/1982 cells) and poor-quality cellswere identified in each stage according to thedistribution of the number of genes per cell (fig.S2). After quality control, we detected a mean of1527 genes per cell across the entire dataset;however, the number of genes detected washighly dependent on parasite stage (P < 0.001;fig. S2). Transcriptomes were normalized withTMM (trimmed mean of M-values) in groups ofrelated stages for further analysis. For samplesexpected to be overlapping or heterogeneous

(e.g., the blood stages), we used k-means clus-tering to delineate stages and confirm their clas-sification based on known marker genes andcorrelations with bulk reference datasets (figs. S3and S4). This allowed for differentiation of male,female, and asexual stages in the blood, as well asbetween ookinetes and oocysts in the heteroge-neous population of parasites taken from themosquito midgut during ookinete invasion.All cell transcriptomes were visualized using

uniform manifold approximation and projec-tion (UMAP) (6) (Fig. 1B) and the first three com-ponents of a principal components analysis (PCA)(Fig. 1C). The results showed orientation of cellsalong a developmental path and also, to someextent, the formation of groups according tocellular strategy and host environment [e.g.,actively replicative stages such as trophozoitesand oocysts are near each other in UMAP andthe first two principal components, whereas thethird principal component separates the cells byhost; Fig. 1, B and C]. All stages displayedmarkergenes concordant with known expression pat-terns (fig. S5). Merozoites, rings, trophozoites,and schizonts formed a circle, capturing thecyclical nature of the asexual intraerythrocyticdevelopmental cycle (IDC) (Fig. 1B and fig. S3).A portion of this developmental trajectory wasclosely paralleled by the 44-hour liver schizonts.For these cells, we also captured the host’stranscriptome, confirming at a single-cell levelthat the parasite’s developmental progression isindependent of the host cell’s cell cycle state (7)(fig. S6). In the mosquito stages, we observed aclear and abrupt transition from ookinetes tooocysts 48 hours after an infectious blood mealas the parasite crossed the mosquito midgut(fig. S4), and we were able to identify genes thatare differentially expressed along this under-studied developmental trajectory (data S1). Inthe two sporozoite collections (direct from sali-vary glands versus injected by infected mosqui-toes into a mock host), we detected that uponrelease from the glands into the host, sporozoitesexpress twice as many genes and up-regulategenes necessary for host invasion (data S1). Thisincrease in the number of detected genes mightindicate an activation of sporozoites in the mam-malian host prior to liver cell invasion.Our survey of the P. berghei life cycle enables

a global view of gene expression and “guilt-by-association” prediction of function based oncoexpression patterns. We constructed a force-directed k–nearest neighbor (k-NN) graph, whereeach node represents one of 5156 genes detectedin the dataset (data S1). Graph spectral cluster-ing (8) was used to assign each gene to one of20modules based on the graph distancematrix(Fig. 2A and data S1). Clusters were enriched forgenes involved in specific biological processesand displayed distinct patterns of expressionthroughout the life cycle (Fig. 2B, fig. S7, anddata S1). Some gene clusters (1 and 2) consistedmainly of housekeeping genes and ribosomalRNA components highly expressed across thefull life cycle. At the other extreme, severalclusters (clusters 18 to 20) showed low mean

RESEARCH

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 1 of 11

1Wellcome Sanger Institute, Wellcome Genome Campus,Cambridge CB10 1SA, UK. 2Danish Institute of AdvancedStudy (D-IAS), Department of Biochemistry and MolecularBiology, University of Southern Denmark, Odense, Denmark.3International Centre of Insect Physiology and Ecology(icipe), Nairobi, Kenya. 4Wits Research Institute for Malaria,MRC Collaborating Centre for Multi-disciplinary Research onMalaria, School of Pathology, Faculty of Health Sciences,University of the Witswatersrand, Johannesburg, SouthAfrica. 5Walter and Eliza Hall Institute of Medical Research,Parkville, Victoria, Australia. 6MRC–University of GlasgowCentre for Virus Research, Glasgow, UK. 7Laboratory forMolecular Infection Medicine Sweden, Department ofMolecular Biology, Umeå University, Umeå, Sweden.8MIVEGEC, IRD, CNRS, University of Montpellier,Montpellier, France.*These authors contributed equally to this work. †These authorscontributed equally to this work.‡Corresponding author. Email: [email protected]

on Novem

ber 29, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 3: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

expression across cells and were primarily com-posed of genes from rapidly evolving multi-gene families (pirs and fams), which have no1:1 orthologs with P. falciparum (Fig. 2B andfig. S8). Ten gene clusters (clusters 7 to 16) werehighly expressed in a single stage.We corroboratedthese stage-specific gene modules using twoadditional methods. First, we identified markergenes based on level of expression relative to allother stages (data S1). Additionally, for eachstage we defined a core transcriptome of geneswhere transcripts were detected in >50%of cells(data S1 and S2). The number of genes unique tothe core transcriptome for each canonical stageranged from 0 in merozoites to 237 in oocysts(data S1). Core genes for each stage were over-

represented in clusters that coincided with ex-pression at that stage and contained genesinvolved in the cellular strategy of that stage(e.g., DNA replication, invasion, sexual devel-opment; fig. S9), confirming our module assign-ment to specific stages.The majority of gene clusters showed pre-

dominant expression in specific stages (Fig. 2Band fig. S7), offering new guidance as to whereand how these genes might function. For ex-ample, genes encoding CelTOS and circumsporo-zoite protein (CSP), both important for invasion(9, 10), were found in cluster 16, which contains79 genes most highly expressed in the invasiveookinete and sporozoite stages. Among these79 genes are 34 annotated only as “conserved

Plasmodium protein with no known function.”Their highly correlated expression with knowninvasion genes and their high expression ininvasive stages will help to inform future func-tional studies. We also overlaid asexual growthrate data from a genome-scale knockout screen(11). Genes expressed primarily in transmissionstages (clusters 11 to 16) tended to show normalgrowth rates in asexual blood stages (Fig. 2C),offering further support for the idea that genesin these clusters are primarily important intransmission stages. For genes in each cluster,we also identified motifs enriched in their up-stream regulatory region [–1000 base pairs (bp)],which could be binding sites for apetala-2 (AP2)transcription factors that play critical roles in

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 2 of 11

Fig. 1. A single-cell atlas of the P. berghei life cycle. (A) The life cyclebegins when an infected mosquito injects sporozoites into the mammalianhost. From here, parasites enter the liver, where they develop, replicate,and then egress to enter the IDC. During the IDC, parasites invadeerythrocytes, where they develop, replicate asexually, burst, and re-invadeerythrocytes cyclically. Sexual forms are taken up by the mosquito, and iffertilization is successful, parasites invade the midgut and subsequently

the salivary glands of the mosquito. In these different environments,parasites adopt different cellular strategies: replicative stages (liver stage,schizont, oocyst), invasive stages (merozoite, ookinete, and sporozoite),and sexual stages (male and female gametocytes). (B) UMAP of single-celltranscriptomes sampled from all stages of the life cycle, with cellscolored according to their stage from (A). (C) The first three principalcomponents from transcriptomes of all stages in the life cycle.

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 4: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

parasite progression through the life cycle (fig.S10 and data S1). Several of these motifs areconserved with P. falciparum in asexual stages(12) and transmission stages (13) (fig. S10). Thiscategorization will help to functionally anno-tate genes with no known function, thereby

enabling informed studies on gene regulationand supporting efforts to identify good candi-dates for transmission-blocking drug and vac-cine development.Development is the primary driver of differ-

ences in gene expression across the life cycle.

However, variation between individual parasiteswithin developmental stages is important foradaptation to the host environment (14). Theprincipal mechanism for intrastage variation isthought to be driven by variation in expressionamong members of large multigene families

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 3 of 11

Fig. 2. Graph-based clustering of genes reveals gene usagethroughout the life cycle. (A) A k-NN force-directed graph of all 5156detected genes. Each node represents a gene. Nodes are coloredaccording to their graph-based spectral clustering assignment, and eachcluster is labeled by cluster number. (B) A heat map of mean expressionfor each cluster across all cells in the dataset. Cells are ordered by theirdevelopmental progression. Shown at the right are the number of genes ineach cluster (N), the most significant Gene Ontology term (biological

process) associated with the cluster (from data S1, Benjamini P < 0.05),and the top cited gene in the cluster (based on PlasmoDB). In theclusters where there is a tie for lowest P value, the GO term with thegreatest percentage of genes in the cluster relative to the background isshown. If terms had identical representation, the term with the lowestGO number is shown (see complete table in data S1). (C) The samegraph as in (A) colored according to relative growth rate of knockoutmutants in asexual blood-stage parasites (11).

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 5: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 4 of 11

Fig. 3. Alignment of datasets reveals transcriptional rates in the IDC.(A) P. berghei 10x data mapped to Smart-seq2 data using scmap-cell.Cells are grouped according to their 10x cluster assignment (figs. S13 andS15) and the cluster of the Smart-seq2 cell it mapped to (fig. S15).A cosine similarity threshold of 0.5 led to classification of 283 cells (<6%of cells) as unassigned (UA). (B) PCA of P. berghei IDC cells from 10x.Pseudotime of each cell was measured by fitting an ellipse to the data andcalculating the angle (radians) around the center of this ellipse for eachcell relative to the start cell (red point). Black points represent the meanPCA coordinates of the bulk prediction for each cell (22) (fig. S14).

(C) Left: PCAs of three Plasmodium species colored by their P. berghei cellassignment based on scmap. Arrows represent the relative change intranscriptional state based on RNA velocity. Right: Scaled increase inexpression over the IDC. Cells are ordered according to the pseudotime oftheir scmap-assigned cell in the IDC P. berghei index. The top barrepresents the matched time point between the P. berghei RNAvelocity–derived transcription rates and P. falciparum transcription ratesreported by Painter et al. (21) using Pearson correlations. Vertical graylines mark peaks and troughs determined from the P. berghei data, asdescribed in the methods.

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 6: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

whose functions are poorly defined (15). Nearlyall life cycle stages showed highly variable ex-pression in at least one of such multigene fam-ilies (fig. S11). The largest of these families, pir,has a role in establishing chronic blood-stageinfections (16). Subsets of pir genes showedvariable expression in different stages, coupledwith distinctive upstream sequences (fig. S12).Such putative promoter architectures could de-fine stage-specific expression, with epigeneticcontrol defining which subset is expressed. In-terestingly, we found five coexpressed pairs ofpir genes in merozoites and rings (fig. S12). Pirgenes within each pair were split across differ-ent chromosomes but shared similar promoterarchitectures, with different pairs having dif-ferent promoters (fig. S12). Although the func-tion of these coexpression patterns is as yetunknown, such coexpression in a single cell canonly be detected using scRNA-seq, highlightinganother use of scRNA-seq toward identifyingnovel expression patterns.

Expanding the Malaria Cell Atlas acrosstechnologies and species

Droplet-based approaches to generate single-cell transcriptomes are less than 10% as costly asSmart-seq2 per cell, enabling the explorationof many more cells. To more deeply sampleparasites along the entire IDC, we used 10xGenomics’ droplet-based Chromium platformto simultaneously capture P. berghei and anotherparasite species, P. knowlesi, in a single inlet (17).We found that 6.34% of cells were dual-speciesdoublets, confirming a doublet rate as expectedfor mammalian cells (fig. S13). After removal of

doublets and additional quality control, we captured4884 P. berghei cells and 4237 P. knowlesi cells.We used canonical correlation analysis (CCA)

and scmap (Fig. 3A and fig. S13) (18, 19) to mapP. berghei life stages between the 10x andSmart-seq2 technologies. CCA clustering showedgood representation of all stages in both Smart-seq2 and 10x datasets within the IDC (fig. S13).Using scmap-cell, 94% of cells in the 10x datawere assigned to a Smart-seq2 cell with highconfidence, allowing us to align datasets (Fig.3A). The additional P. berghei 10x data increasedthe coverage of cells in our atlas and confirmedour ability to evaluate single cells characterizedby different methodologies. To account for thecontinuous cyclical nature of the data, we or-dered the 10x cells in pseudotime by fitting anellipse to the first two principal components andcalculating the angle around the center of thisellipse for each cell relative to a start cell (Fig. 3Bandmethods). To annotate the orientation of thecycle with real time, we correlated each single-cell transcriptome with published bulk referencedatasets and observed a high correspondencebetween bulk time point and pseudotime order(Fig. 3B and fig. S14).Additionally, we generated a 10x dataset

comprising 6737 cells from the IDC stages ofthe human parasite P. falciparum. We usedscmap-cell to assign each P. falciparum andP. knowlesi cell to the P. berghei 10x referenceindex built with 1:1 orthologs (data S3), thusenabling us to align the developmental trajec-tories of these three species (Fig. 3C and fig. S15).We used this alignment to examine patternsof transcription in all three species in a time-

resolved manner across the IDC. We calculatedRNA velocity (20), which measures the rate ofchange of mRNA molecule abundances, acrossthis deeply sampled set of Plasmodium para-sites. We found that the rate of change variesmarkedly over the IDC in P. berghei, with peakvelocity occurring in late rings, consistent withbulk studies of nascent RNA transcription (Fig.3C and fig. S16) (21). We compared patterns oftranscription as measured by RNA velocity at keytransitions across species (Fig. 3C and fig. S16).Overall, the alignment of developmental trajec-tories across species revealed generally similarpatterns of RNA velocity in the IDC of differentmalaria species; however, some life cycle stagesand genes showed more similarity than others(fig. S17, table S2, and data S4). These parasitespecies naturally infect vastly different hostsand have different IDC lengths (24 hours forP. berghei, 27 hours for P. knowlesi, and 48 hoursfor P. falciparum). RNA velocity analyses high-light that the pace of transcription varies to sup-port development to the next life stage in theIDC, with some similarity across these threespecies (Fig. 3C and fig. S16).Transcriptomic studies of both in vivo and in

vitro malaria parasites are often confounded bymultiple life stages within a single sample. TheMalaria Cell Atlas can be used to deconvolvebulk transcriptomic data and identify the spe-cific life stages that were present in a bulk RNA-seq sample. We demonstrated this with theuse of published bulk RNA-seq datasets fromP. berghei (22, 23) andP. falciparum (24) (fig. S18).Future bulk RNA-seq studies can use the atlasto identify differences in cell type composition,

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 5 of 11

Fig. 4. The Malaria Cell Atlas enables high-resolution mapping of field-derived single-cell transcriptomes of P. falciparum and P. malariae.(A) Phylogeny of Plasmodium showing the mammalian host and the stages found in circulation for each species. P. falciparum and P. berghei sequestertheir late stages in deep tissue, whereas other species have all morphological forms in circulation. Species in color were profiled in the atlas.(B) P. falciparum and P. malariae field-derived cells mapped onto the P. berghei 10x reference index using scmap-cell. The field-derived samplesmapped to developmental stages that were expected in circulation for each species.

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 7: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

potentially regressing these out to calculatemore accurate differential expression betweenconditions or samples. Furthermore, scRNA-seqdata generated by other groups (25) using adifferent droplet-based technology (Drop-seq)are also easily classified using scmap and theatlas data provided here (fig. S19). Together thesefindings confirm our ability to evaluate singlecells characterized by different methodologiesacross parasite species that divergedmore than12.5 million years ago (26).

The Malaria Cell Atlas as a reference forclinical samples

In vitro systems, although critical for experi-mental studies of Plasmodium parasites, areunlikely to fully capture the breadth of expres-sion variation of parasites circulating in natu-rally infected carriers. Moreover, there are sixphylogenetically distinct human-infecting spe-cies (Fig. 4A), several of which do not have anyexisting expression data and cannot be culturedin vitro. We therefore explored whether scRNA-seq data fromwild parasites taken straight frominfected people could be used to place wildparasites in developmental time using our atlas.We developed a methanol-based preservationprotocol that produced Smart-seq2 transcrip-tomeswith equivalent quality to unpreserved cellsin the lab (fig. S20). Next, we used the protocolto preserve samples from three naturally infectedasymptomatic carriers in Mbita, Kenya, whichwe then sorted and sequenced in the UK. Werecovered single-cell transcriptomes from allthree volunteers, and these field-collected sam-ples displayed quality similar to that of labora-tory samples (fig. S20). P. falciparum cells mappedto our atlas revealed male, female, and earlyasexual parasites (Fig. 4B), which are the ex-pected circulating stages for this species (Fig. 4A).Cells clustered by stage and not by donor, in-dicating that comparisons both within andbetween hosts are possible; this indicates thatscRNA-seq on field parasites will enable tran-scriptional characterization of natural infections.One of the volunteers was also infected withP. malariae, leading to the acquisition of tran-scriptomic data for this underexplored species.Notably, we observed late developmental stages;this was expected because, unlike P. falciparum,P. malariae late stages do not sequester in deeptissue (Fig. 4, A and B). As a proof of concept, wehave shown that parasite species that have pre-viously been inaccessible for expression anal-ysis can now be characterized by combiningscRNA-seq with the atlas.The Malaria Cell Atlas reference dataset com-

prises 15,858 parasite transcriptomes coveringevery life stage along the parasite’s life cycle atsingle-cell resolution, and spans different tech-nologies and different parasite species. The dataare freely accessible as a processed dataset andthrough a user-friendly web interface (27, 28).As such, this will be a key resource for the ma-laria community in the study of transcriptionalregulation and control of developmental progres-sion at the highest resolution. Because the Ma-

laria Cell Atlas provides a foundation for studyingthe biology of individual parasites directly fromtheir natural environment, it represents an impor-tant endeavor toward characterizing phenotypescritical for malaria control, including those relatedto pathogenicity, drug resistance, and transmissionbiology.

MethodsParasite culturing in vivo and in vitro

P. berghei parasites came from drug selectionmarker-free reporter line RMgm-928 that ex-pressedmCherry, under the control of the hsp70promoter, throughout the life cycle (29). Para-sites were propagated in female 6- to 8-week-oldTheiler’s Original outbredmice supplied by EnvigoUK. Mosquito infections were performed in 2- to5-day-old Anopheles stephensi mosquitoes.P. falciparum (3D7) was maintained in O+

blood using RPMI 1640 culturemedium (Gibco)supplementedwith 25mMHEPES (Sigma), 10mMD-glucose (Sigma), hypoxanthine (50 mg/liter,Sigma), and 10% human serum (obtained locallyin accordance with ethically approved protocols)in a mix containing 5% O2, 5% CO2, and 90% N2.P. knowlesi (strain A1-H.1) was maintained

in continuous culture (30) in O+ blood, usingRPMI 1640 culture medium (Gibco) supple-mented with 25 mM HEPES (Sigma), 22.2 mMD-glucose (Sigma), hypoxanthine (50 mg/liter,Sigma), 0.5% (wt/vol) Albumax II, and 10% horseserum, in a mix containing 5% O2, 5% CO2, and90% N2. Cultures were maintained for >6 weekswithout synchronization to ensure good repre-sentation of all stages in the IDC.HumanO+ erythrocyteswere supplied byNHS

Blood and Transplant, Cambridge, UK. All sam-ples were anonymized. Use of erythrocytes fromhuman donors for Plasmodium culture was ap-proved by the NHS Cambridgeshire 4 ResearchEthics Committee (REC reference 15/EE/0253)and theWellcome Sanger Institute HumanMa-terials and Data Management Committee.

Parasite isolation, cell sorting, and librarypreparation for Smart-seq2 scRNA-seqIsolation of extraerythrocytic forms fromHeLa cells

HeLa cells were cultured in DMEM supplementedwith 10% FCS. P. berghei sporozoites were pro-duced by homogenization of 50 dissected setsof salivary glands from female An. stephensimos-quitoes 22 days after an infectious blood meal.Sporozoites were counted on a hemocytometer,resuspended in DMEM, and added to an 80%confluent monolayer of HeLa cells at multi-plicity of infection of 1. The plate was spun at300g for 3 min and incubated at 37°C for 2 hours;cells were thenwashed twicewith PBS and placedback in complete medium. After 24 hours, cellswere split back at 70% confluency. Cells wereharvested by trypsinization 44 hours after infec-tion, washed once in PBS, and sorted immediately.

Isolation of blood-stage merozoites

P. berghei parasites were purified from an over-night (24 hours) 50-ml culture with 1 ml of in-

fected blood using a 55% Histodenz cushion(Sigma) as detailed in (31). Purified schizont stageswere stained with Hoechst 33342 (ThermoFisher)at a final concentration of 2.5 mg/ml for 10 minon ice, pelleted at 450g for 3 min, resuspendedin 1 ml of medium, and passed through a 1.2-mmfilter (Pall Life Sciences). Merozoites in the filteredfraction were sorted immediately.

Isolation of ring-stage parasites

A mouse infected with RMgm-928 was termi-nally bled by cardiac puncture using a syringecontaining heparin. The ~1-ml blood samplewas immediately transferred onto ice and stainedwith Hoechst 33342 (2.5 mg/ml) in PBS for 15 min(along with unstained controls for cell sorting).Cells were washed in RPMI and spun at 800g for3 min and washed once more with PBS. Parasiteswere then incubated in 0.02% saponin in PBSfor 3 min and then spun down at 1100g at 4°C.Parasites were washed once in PBS, pelleted at1100g for 3 min, and then resuspended in 1 mlof PBS prior to sorting.

Isolation of ookinetes from the blood bolus

Ookinetes were isolated from the blood bolusof An. stephensimidguts at 18 and 24 hours afterblood feeding from an RMgm-928–infectedmouse at approximately 5% parasitemia. Alateral incision was made along the dissectedmosquito midgut tissue to release the bloodbolus and remaining blood was rinsed out usinga syringe with PBS. Boluses from five midgutswere pooled, diluted in 500 ml of PBS, and stainedwith SYBR green. To discriminate ookinetes fromother stages in the blood bolus, a control feedwas performed using a HAP2–-mCherry–infectedmouse. HAP2 is essential for fertilization, so thebolus contained parasites but no ookinetes (32).This allowedus to enrich our sample for ookinetesby gating on the level ofmCherry and SYBR greenfluorescence (fig. S1, A to C).

Isolation of invading ookinetes and oocystsfrom the midgut

At 48 hours and 4 days post–blood meal, in-vading ookinetes and oocysts were isolated from10 pooled infected midguts. Dissected midgutswere disassociated in 200 ml of an enzymaticcocktail of collagenase IV (1 mg/ml) and elastase(1 mg/ml). The dissociation mixture was incu-bated at 30°C for 30minwith shaking at 300 rpm.Every 15min, tissuewasmechanically disrupted bypipetting up and down 40 times. To capture onlyinvading ookinetes at 48 hours, the remainingblood bolus was removed as described above. Asa control, midguts from mosquitoes that had fedon aHAP2–-mCherrymousewere disassociated toconfirm that no remnants of the blood meal andnoninvading parasites remained in the gut.

Isolation of salivary gland and injectedsporozoites

Salivary glands from 20 An. stephensi infectedwith RMgm-928 were dissected on day 26 post–blood meal. Sporozoites were released from theglands by homogenizing the samples manually

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 6 of 11

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 8: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

with a pestle in PBS. Samples were filtered witha 20-mm filter prior to sorting to remove largefragments of mosquito tissue. Simultaneously,female An. stephensimosquitoes from the sameinfectious feed were fed using a membranefeeding assay containing approximately 600 mlof fructose solution (80 g/liter) with 10% hu-man serum (filter-sterilized and heat-inactivated).Mosquitoes were exposed to the feeder for 12 min.After this, the remaining fructose/serum solutionwas removed from the feeder, and the presence ofsporozoites in this solution was microscopicallyconfirmed. Samples were then taken directly tocell sorting.

Preservation and isolation of cells fromfresh peripheral blood samples

Samples were procured in the district of Mbita(Kenya) from asymptomatic volunteers in ac-cordance with a study protocol reviewed andapproved by the KEMRI Scientific and EthicsReviewunit (KEMRI/RES/7/3/1). After screeningwith a rapid diagnostic test [SD BIOLINE Ma-laria Ag P.f/Pan (HRP-II/pLDH)], venous bloodsamples from infected volunteers were collectedin EDTA-vacutainers. For each sample, two dif-ferent purificationmethods were applied on 1 mlof the sample each to recover the later IDC andsexual stages on the one hand and the early IDCstages on the other. For the former, 1 ml of bloodwas resuspended in 5 ml of suspended anima-tion buffer (10 mM Tris, 150 mM NaCl, 10 mMglucose, pH 7.37) (33) and placed on a magneticcolumn (MACs, Miltenyi). Late-stage parasiteswere eluted, washed once in suspended anima-tion buffer, and resuspended in 200 ml of PBS.These were then fixed with 800 ml of methanol(Sigma) and preserved at –20°C. For the earlyIDC sample, another 1 ml was leucodepleted witha Plasmodipur filter (EuroProxima), washed twicein PBS, lysed twice with 0.15% saponin in PBS(Sigma), washed twice in PBS, and resuspendedin 200 ml of PBS. Samples were then also fixedwith 800 ml of methanol (Sigma) and preserved at–20°C. Both sample types were rehydrated withPBS, stained with Hoechst (2.5 mg/ml) in PBS for15 min, and washed once in PBS prior to sorting.

Cell sorting

All parasite cell sorting was conducted on anInflux cell sorter (BD Biosciences) with a 70-mmnozzle. The HeLa samples were sorted on a SonySH800 with a 100-mm nozzle chip. Parasiteswere sorted by gating on single-cell events andmCherry fluorescence (all stages) or Hoechstfluorescence (merozoites, field parasites). Allparasites were sorted into nuclease-free 96- or384-well plates (ThermoFisher) containing lysisbuffer as described (5). Sorted plates were spun at1000g for 10 s and immediately placed on dry ice.

Library preparation and sequencing

Reverse transcription, PCR, and library prepara-tion were performed as detailed (5). All librarieswere prepared in 96-well plates except a single384-well plate of late blood stages. In the lattercase, the lysis buffer volume was reduced to 2 ml,

and the elongation temperature of the PCR wasreduced to 68°C. Cells were multiplexed up to384 and sequenced on a single lane of HiSeq2500 v4 with 75-bp paired-end reads.

Parasite preparation and loading of10x scRNA-seqParasite preparation

For P. berghei samples, blood was obtained byterminal bleed and passed through a prewettedPlasmodipur syringe filter (Europroxima) tofilter out white blood cells prior to culturing.Three cultures were generated: cultured for30 min, 10 hours, and 20 hours at 36.5°C withshaking at 65 RPM. Cultures were smeared priorto harvesting in order to ascertain parasitemia.After harvesting, the total number of red bloodcells in each sample was counted using a dis-posable hemocytometer. This count was corrobo-rated using a Countess cell counter. The numberof infected red blood cells in each culture wasused as a cell count, and cells were pooled 1:1:1from the three time points and kept on ice. ForP. knowlesi, the parasitemia of the cultureddesynchronized parasites was measured andthen cells were harvested by centrifugation at450g for 3 min at 4°C. Supernatant was re-moved and parasites were washed twice in PBSbefore resuspension in PBS. The concentrationof red blood cells was then calculated bymanualhemocytometer, before calculating the finalinfected red blood cell concentration using theparasitemia. Cells were then pooled 1:1 with theP. berghei cell mixture described above in orderto run a dual-species 10x analysis to evaluatedoublet rates. P. falciparum parasites were pre-pared in the same manner as P. knowlesi butwere run on their own 10x inlet.

10x loading

Cells were loaded according to manufacturer’sinstructions to recover 10,000 cells per inlet. 10xChromium Single Cell 3′ Library v2 chemistrywas used and libraries were prepared accordingto manufacturer’s instructions. Each 10x input li-brary was sequenced across twoHiseq 2500 RapidRun lanes using 75-bp paired-end sequencing.

Bulk transcriptomics

Three P. berghei samples were prepared for bulkRNA-seq including early asexuals, late asexuals,and ookinetes. Mice infected with hsp70p:mCherryP. bergheiwere terminally bled by cardiac punc-ture using a syringe containing heparin. For thetwo asexual samples, the blood was treated withammonium chloride to remove uninfected ery-throcytes (34) either straight after the bleed(early) or after 24 hours of ex vivo culture (late).For the ookinete sample, the blood was culturedfor 24 hours as described (35). RNAwas extractedwith TriZol according to the manufacturer’s re-commendations and assayed with an AgilentRNA 6000 Nano assay, and transcriptomes weregenerated as described. A modified RNA-seqprotocol was used. PolyA+ RNA (mRNA) wasselected using magnetic oligo-d(T) beads. Re-verse transcription using Superscript III (Life)

was primed using oligo d(T) primers; second-strand cDNA synthesis included dUTP. The re-sulting cDNA was fragmented using a CovarisAFA sonicator. A “with-bead” protocol was usedfor dA-tailing, end repair, and adapter ligationusing “PCR-free” barcoded sequencing adaptors(NEB) (36). After two rounds of SPRI cleanup(Agencourt), the libraries were eluted in EBbuffer andUSER enzymemix (NEB)was used todigest the second-strand cDNA, generating direc-tional libraries. The libraries were quantified byqPCR and sequenced on an Illumina HiSeq 2500.

Mapping and generation of expressionmatrices for scRNA-seq transcriptomesSmart-seq2 mapping

Single-cell Plasmodium transcriptomes weremapped as reported previously (5). Briefly,trimmed reads were mapped using HISAT2(v 2.0.0-beta) (37) to the P. berghei v3 genome(October 2016), and using STAR (v 2.5.0a) to theP. falciparum v3 (January 2016) and P. malariaev1 (March 2018) genomes using default param-eters (38). Reads were summed against genesusing HTseq (v 0.6.0) (39). For the coexpressionof HeLa cells and liver-stage parasite analysis,both HeLa cells and parasites were mapped torespective genomes with STAR (v 2.5.1b) usingdefault parameters (38).

10x data alignment, cell barcodeassignment, and UMI counting

Cell Ranger single-cell software (version 2.1.0)was used to process sequencing reads, assign-ing each read to a cell barcode and UMI usingstandard parameters (17). After barcode assign-ment, the cDNA insert read was aligned usingCell Ranger (v 2.1.0) to a combined referencegenome ofP. knowlesi (March 2014) andP. berghei(July 2015), and the P. falciparum run was alignedto the 3D7 genome v3 (January 2016). These ref-erence genomes were all obtained from www.sanger.ac.uk/resources/downloads/protozoa/.

Filtering and normalization ofscRNA-seq dataSmart-seq2 filtering and normalization

Poor-quality cells were identified on a per-stagebasis according to the distribution of the num-ber of genes per cell, given the high variability ofgenes detected between stages (fig. S2, A and B).Cells with fewer than 1000 genes per cell and2500 reads per cell were removed from the liver-stage parasites, trophozoites, male and femalegametocytes, ookinetes, ookinetes/oocysts, andoocyst stages. Cells with fewer than 500 genesper cell and 2500 reads per cell were removedfrom schizonts and injected sporozoites. Cellswith fewer than 40 genes per cell and 1000 readsper cell were removed from merozoites, rings,and gland sporozoites (fig. S2 and table S1).Additionally, we removed genes from furtheranalysis that were detected in fewer than twocells across the entire dataset. The final datasetcontained 1787 high-quality single cells from 1982sequenced cells and 5156 genes out of 5245 geneswith annotated transcripts. Transcriptomeswere

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 7 of 11

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 9: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

normalized with the weighted TMM method(40). Cells were normalized either all togetheror in five groups containing biologically similarstages; groups included IDC, liver-stage, game-tocytes, ookinetes/oocysts, and sporozoites. Visualinspection of the relative expression plot (fig. S2D)showed little difference between normalizationby biological group versus all together. Unlessotherwise specified, further analysis was doneon cells normalized by biological group.

10x filtering and normalization

For P. berghei, the output filtered matrix fromCell Ranger was read into Seurat (v 2.3.4) (18).Low-quality P. berghei cells with fewer than 230detected genes were removed from further anal-ysis. Initial inspection of filtered cells in theP. knowlesi and P. falciparum datasets showedthat early-stage and late-stage IDC parasiteswere missing. These stages express fewer genesper cell relative to later stages based on our Smart-seq2 data, and we have previously observed alower detection of genes per cell in P. falciparum(5), suggesting that these cells may have beenremoved by Cell Ranger’s default thresholding.Using the raw output matrices for these species,we adjusted thresholds to retain cells with >100genes per cell for P. falciparum and >150 genesper cell for P. knowlesi. Intraspecies doubletswere identified and removed from all threespecies using doubletFinder (v 1.0.0) (41). Forthe P. berghei/P. knowlesi run, we identifiedinterspecies doublets as cells that contained>50 UMIs that mapped to each species (fig.S13A). The expected intraspecies doublet ratewas calculated on the basis of this interspeciesdoublet rate, the relative proportion of each spe-cies, and the additional quality control thresh-olding. For P. falciparum, the intraspeciesdoublet rate was calculated from the expecteddoublet rate table provided by 10x Genomics.Thus, the numbers of intraspecies doublet cellsremoved were as follows: P. berghei = 200,P. knowlesi = 287, P. falciparum = 530 (fig.S13B). Doublets do not show a stage-specific bias(fig. S13B).

Single-cell transcriptome analysis ofSmart-seq2 dataCell clustering and projection

For timepoints where a heterogeneous popula-tion of stages was collected, we used k-meansclustering using SC3 (version 1.7.7) to delineatestages and confirmed their classification basedon known marker genes (42). This method wasused for classification of males, females, troph-ozoites, and schizonts, as well as ookinetes andoocysts (figs. S3 and S4). For visualization in twodimensions, we performed UMAP (6) with the py-thon package umap version 0.1.1 using the correla-tion distance metric, k–nearest neighbors of10, min_dist of 1, spread of 2, and bandwidth of 1.

HeLa cell quality control and cell-cycleanalysis

We performed initial filtering to identify themost robustly expressed genes across single cells.

Genes were required to be expressed in >30 cells(of 164 cells) and cells needed to express >500genes in both parasite and matched HeLa cellsto be retained. This resulted in 163matched cellswith 4480 parasite genes and 8059 HeLa cellgenes. We performed clustering of single HeLaand parasite cells independently using either allhighly variable genes or subsets of annotatedcell cycle genes. The highly variable genes wereidentified by plotting the averaged gene expres-sion against gene dispersion [similar to Seurat(18)]. Louvain clustering was performed on sin-gle HeLa cells using only cell cycle genes, re-sulting in four louvain groups (fig. S6D). Thesegroups are highly indicative of cell cycle progres-sion starting from group 0 (G0/G1) to group1 (G1S) to group 3 (G2) to group 4 (G2M).

Pseudotime

To order cells in a developmental trajectory, wereconstructed pseudotime using SLICER (43).Variable genes were identified within SLICERand then selected to build the trajectory basedon a neighborhood variance that identifies genesthat vary smoothly across the cell sets. SLICERwas run independently on three groups of cells:(i) the liver-stage parasites, (ii) the entire IDC(merozoites, rings, trophozoites, and schizonts),and (iii) the ookinete-to-oocyst transition (bolusookinetes, ookinete/oocyst, and oocyst). We as-sessed the performance of the algorithm by con-firming that the pseudotime order matched theground truth time point collections and expres-sion of known marker genes over development(e.g., fig. S4E). To order all cells across the lifecycle, we compiled these pseudotime orders withknown timing of other stages that did not show adevelopmental signature (mature gametocytesand sporozoites).

Differential expression

Differential expression over the ookinete-to-oocysttransition and between gland and injected spo-rozoites was performed in monocle using thedifferentialGeneTest function (44).

Gene clustering and visualization ofSmart-seq2 data

The gene count matrix was normalized by di-viding by themean counts for each gene and logscaling. This was done to reduce the amount towhich gene clusters were driven by total geneexpression and instead focus on the patternof expression across cells. A k-NN graph wasformed on the gene-normalized expressionmatrix with the Nearest Neighbors subpackageof python’s scikitlearn version 0.19.2 with pa-rameters of k = 5 and a manhattan distancemetric (45). We chose a k value of 5 because itwas smaller than the smallest cluster we wereinterested in detecting, and the graph appearedrobust from k = 3 to k = 20. We then performedspectral graph clustering on this k-NN graphusing the SpectralClustering subpackage of py-thon’s scikitlearn version 0.19.2 (8, 46). Thegraph was visualized in Gephi with the force-atlas 2 graph layout algorithm in linlog mode

to better show the clustering structure of thedata (47, 48). Gene Ontology analysis was con-ducted with the dedicated PlasmoDB tool (49).The top cited gene in each cluster was iden-tified using the literature tool on PlasmoDB (49).

Marker genes

Marker genes for each stage were identified intwo ways. First, differentially expressed geneswere calculated using the findMarkers func-tion in scran (50). This function performs aWelch t test between pairs of stages and thenidentifies genes that are uniquely expressed inthat cluster (pval.type = “all,” direction = “up”).This method was used to identify markers foreach canonical stage, as well as marker geneswithin each host (mouse versus mosquito) andeach cellular strategy (invasive, replicative, andsexual forms) (data S1). Second, marker geneswere identified by defining a core set of genesfor each stage as all genes that are expressed inmore than 50% of cells. To avoid bias from thenumber of cells sampled within a stage, 60 cellswere randomly selected per stage (data S2). Wedefined the unique core transcriptome as genesfrom each stage’s core that were unique to thatstage’s core (i.e., not also found in more than50% of cells from any other stage) (data S1).

Motif discovery

Motif discovery was performed using DREME,which searches for short (8 bp)motifs expressedas regular expressions (consensus sequencesallowing for wildcards but not variable lengthgaps) in a given set of sequences (51). The 1000 bpupstream of the start codon for each gene de-tected in the Smart-seq2 dataset was used in theanalysis. For each cluster, the input dataset wasthe upstream regions of each gene within that clus-ter and the negative set was the upstream region ofgenes that were not in that cluster. Clusters 18, 19,and 20 were not included in this analysis becausethey contain many paralogous genes from genefamilies with large duplicated upstream regions.The top motif of each cluster was compared tomotifs from (12, 13, 52, 53) using Tomtom (54).

Analysis of development-independent geneexpression variability

Almost all genes in Plasmodium genomes varyin expression over the life cycle. This is mainlythought to be related to the development of theparasite as it transitions between different lifestages. We first identified highly variable genesin each stage independently. In the Smart-seq2data, we used a general linear model to regressout the effect of pseudotime within developingstages [liver-stage exo-erythrocytic forms (EEFs),merozoites, rings, trophozoites, schizonts, ookinetes,and oocysts]. We preserved the mean expres-sion of each gene by adding the predicted valueof themean to the residuals of the general linearmodel; in addition, we set any negative cor-rected values to zero in order to preserve thenon-negativity of gene expression values. Finally,because the correction often shifted zeros tovalues only slightly above zero, we rounded these

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 8 of 11

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 10: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

values down in order to meet the assumptions ofthe M3Drop model. We then used M3Drop (55)to identify genes with remaining heterogeneity[false discovery rate (FDR) ≤ 0.05], adjusted formean expression level. Enrichment of each genefamily within each stage was determined usingthe hypergeometric test with correction by FDRand a cutoff of 0.05.We examined pir gene promoter architectures

to determine whether particular gene expressionpatternsmight be driven by transcription factors.First, we identified the 5′ UTR and upstreamintergenic (putative promoter) regions of the pirgenes shown in fig. S12. This was done manuallyby browsing the genome and referring to threeP. berghei bulk RNA-seq samples of mixed earlyand late asexual stages as well as ookinete stages.Illumina reads from these libraries weremappedto the P. berghei v3 genome sequence usingHISAT2 v2.0.0 (37), with–rna-strandness RF–max-intronlen 5000. The data were viewed usingArtemis v18.0.0 (56). 5′UTRs were defined as theregion between the start codon and where RNA-seq coverage dropped to zero in at least two ofthe three samples. Upstream intergenic regionswere defined from the start of the 5′ UTR to thenext, upstream increase in coverage from one ormore RNA-seq libraries. The upstream inter-genic regions were BLASTed against each other(blastall 2.2.25, -p blastn -e 1e-20). The sequencesinvolved in each hit were extracted, excludingthose overlapping others with lower E-values.These sequences were then BLASTed againsteach other (blastall 2.2.25, -p blastn -e 0.01) andthe resulting similarity matrix was used to clus-ter them with MCL v12-068 (57) with the in-flation parameter set to 1.4. Sequences werecollected together based on the clustering andaligned using MUSCLE v3.8.31 (58). The align-ments were then trimmed by identifying highlyconserved regions. Alignments in nonoverlap-ping windows of 10 nucleotides were evaluated,counting the proportion of sequences that wereungapped. An alignment position was called asgood if ≥70% of sequences were ungapped atthat position. A window of 10 nucleotides wascalled as a block if it contained no more thanthree bad positions. If there was more thanone bad block in a row, a conserved region wasended. Only the longest conserved region froman alignment was kept. Sequences that began orended within the conserved region were thenremoved. These alignments were used to buildnucleotide profile hiddenMarkovmodels (HMMs)using HMMer i1.1rc3 (59). The models were thensearched against the P. berghei v3 genome se-quence, also using HMMer, to identify furthermembers of the sequence families. Each hit wasassociated with the nearest downstream protein-coding gene. We identified eight upstream in-tergenic (promoter) sequence families associatedwith pir genes that we called A, C, D, F, G, H, I,and J (fig. S12).

RNA velocity

For each IDC (ring, trophozoite, schizont) Smart-seq2 cell that passed quality control, the exonic,

intronic, and mixed reads were counted usingRNA velocity (20). Intronic and mixed readswere combined to estimate the total unsplicedreads in the data, whereas purely exonic readswere assumed to represent spliced transcripts.Cells were split by life cycle stage as for thepseudotime analysis, the expected ratio of splicedto unspliced reads for each gene was fit usingRNA velocity, and residuals for each cell wereestimated for each group independently. Toensure that we only considered genes that werefit well by the RNA velocity model, we required aminimum slope of 0.1 (increased from the defaultsetting of 0.05) and a minimum correlation be-tween spliced andunspliced reads of 0.5 (increasedfrom the default setting of 0.05). To improve fits,we used the cells with the top and bottom 7.5% ofexpression levels for the fitting. Genes where morethan 90% of residuals were either positive ornegative were excluded as poorly fit genes. Thisresulted in 1345 genes × 548 cells for the IDC.

10x single-cell transcriptome analysisCell clustering

To identify male and female gametocytes in theP. berghei data, data were log-normalized, andclusters were identified using the shared nearestneighbor modularity optimization-based cluster-ing algorithm in the FindClusters() function inSeurat (18). Two clusters corresponded to game-tocytes based on expression of marker genes.These clusters were removed when comparingthe data to the Smart-seq2 via CCA, as well asfor the pseudotime assignment, and alignmentof the three datasets in scmap (Fig. 3). The threespecies IDC PCAs were generated on TMM nor-malized data in scater (version 1.6.3) (60). Ad-ditionally, we identified clusters using the CCAin Seurat to compare the twomethods of scRNA-seq (Smart-seq2 and 10x) (18). We identified nineclusters of cells that had good representationin both datasets (fig. S13C). One cluster, 8, con-tained only 15 cells across the two datasets andwas removed from further analyses.

SCmap

We used scmap (version 1.1.5) (19) to comparedatasets. We built three sets of cell indices thatcould be queried with the scmapCell() functionthat would allow each individual cell in thequery dataset to bemapped to a reference index(19). To compare the Smart-seq2 and 10x data,we built an index of the blood-stage Smart-seq2data (including gametocytes) and mapped thefull P. berghei 10x dataset (including gameto-cytes) onto it. Because the IDC consists of acontinuous set of cell stages and not discreteclusters, we modified the cell assignment methodin scmap: Cells were assigned based on the topnearest neighbor. If the top cell had a cosinesimilarity of greater than 0.5, the query cell wasassigned to that indexed cell along with its sup-porting metadata (cluster assignment, bulk pre-diction, pseudotime value). Using this cosinesimilarity threshold, 94% of 10x P. berghei cellswere assigned to a cell in the SS2 P. bergheireference dataset.

To align the IDC trajectories across the three10x datasets, we first compiled a set of one-to-one orthologs among 10 Plasmodium species(P. berghei, P. knowlesi, P. falciparum, P.malariae,P. ovale, P. vivax, P. gallinaceum, P. yoelii,P. chabaudi, P. cynomolgi) from OrthoMCL(61) (data S3). Using these orthologs, we builta scmap reference index that contained allP. berghei 10x IDC cells (gametocytes removed).Wemapped both the P. falciparum andP. knowlesidata to this ortholog reference index. In addi-tion to identifying the top nearest neighbor cell,we were able to incorporate information fromthe top three nearest neighbors to assign eachcell based on the principal component space. Todo this, we took amean of the first two principalcomponents of the top three nearest neighbors.Given this coordinate assignment, we locatedthe nearest cell on the PCA and assigned thequery cell to this index cell. If all three of thenearest neighbors had a cosine similarity of0.3, then the query cell was given an assignment.With this lower cosine similarity threshold toaccount for cross-species differences, we wereable to assign more than 96% of P. falciparumand 99% of P. knowlesi cells to a P. bergheiindex cell.Finally, to map single-cell samples from the

field, we built a 1:1 ortholog index of the com-plete 10x P. berghei dataset, including thegametocytes that were excluded for the IDCevaluations. We used this reference because(i) it fully represents the IDC and mature game-tocytes, and (ii) it originates from an in vivosystem like the volunteer cells. Because thegametocyte data were more sparse, the IDCcell assignment was based on the top nearestneighbor alone along with a cosine similaritythreshold of 0.4. Using this method, we were ableto map 13 P. malariae cells and 22 P. falciparumcells, assigning each cell to a developmental time.

“Clock” pseudotime

For the three 10x datasets, pseudotime aroundthe IDC was calculated by fitting an ellipse tothe data projected into the first two principalcomponents using direct least squares (Fig. 3Cand fig. S15). Angles around the center of thisellipse were calculated for each cell and orientedto a starting cell, whichwas defined using knownmarkers. To align the three species in pseudo-time, 10x data from P. knowlesi and P. falciparumwere projected directly onto the P. berghei ref-erence using scmap (19) and cells were giventhe pseudotime of their P. berghei assigned cell.This “clock” pseudotime was aligned to real-timeprogression through the IDC using twomethods.First, bulk RNA-seq data (22) from synchronizedP. berghei parasites across 12 equally spaced timepoints around the IDC were projected onto oursingle-cell reference and their position in the firsttwo principal components’ space was estimatedfrom the average of their three nearest neighbors.These principal components’ locations were usedto calculate a respective pseudotime for each bulksample. Second, we mapped our single-cell RNA-seq transcriptomes onto the densely sampled

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 9 of 11

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 11: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

P. falciparum bulk RNA-seq time course gen-erated by Painter et al. (21). Genes weremappedacross species using 1:1 orthologs (see above)and log-normalized RNA velocity-derived tran-scription rateswerematched to the log-normalizedtranscription rates reported in (21) using Pearsoncorrelations.

RNA velocity

We also ran RNA velocity on the 10x single-cellRNA-seq data from each species independently.Cells were filtered as described above; genes werefiltered to exclude those that did not have at leastone unspliced transcript in at least 10 cells andone spliced transcript in at least 20 cells. Toaccount for the high number of zeros present in10x data, we increased the k for the cell andgene k-NN smoothing included in RNA velocityto 50 and 5, respectively. To ensure good fits tothe genes, we required a minimum slope of 0.2and minimum correlation of 0.2 and used thetop and bottom 20% of cells for the fitting. Inaddition, we excluded poorly fit genes as above.After this filtering, P. knowlesi data contained1235 genes × 4237 cells, P. berghei data contained1368 genes × 4763 cells, and P. falciparum datacontained 645 genes × 6737 cells.

Comparing transcriptional waves

In the transcriptional waves through the IDC inP. berghei, we called 12 peaks and troughs byfitting a smoothed curve to the transcriptionalrate through pseudotime using smooth.splinein R (Fig. 3C), with smoothing parameter (spar)equal to 0.9. We then identified all inversionpoints in the slope of consecutive points in thesmoothed curve. For each of the 13 segmentsdefined by these peaks and troughs, we usedpiecewise linear regression to test for significantincreases or decreases in RNA velocity in eachof the three species (Bonferroni multiple testingcorrection). In addition, we used piecewise lin-ear regression on the individual genes for sig-nificant increases/decreases in RNA velocity acrosseach segment in each species (5% FDR). To ex-amine the conservation of the genes involved inthe transcriptional waves through the IDC, wematched one-to-one orthologs of the genes usedin the RNA velocity analyses across all threespecies. We plotted a heat map of the slopes foreach of the 306 genes that were orthologousacross all three species and passed RNA velocityquality control filters, and counted the numberof genes that had a consistent direction with thepattern observed in P. berghei. Time points werematched across species by determining whichP. berghei timepointhad thehighest proportionofgenes with slopes in the same direction as eachP. knowlesi or P. falciparum time point. Any timepoint where fewer than 10 genes had consistentslopes with any P. berghei time point was con-sidered unmatched. Significance of the matcheswas evaluated using a Binomial test of whethersignificantly more genes had a slope agreeingwith the P. berghei reference than disagreeing.We used this test because is was robust to thefact that the numbers of cells per time point

were not consistent across species, particularlyfor the early time points, which may cause somany of the DE genes to lose significance inthe other species because of low numbers ofcells (low power for the statistical test).

Deconvolution of bulk transcriptomicsamples using scRNA-seq

The P. berghei 10x data were used as a referenceandmarker genes were called for each cluster inSeurat (v 2.3.4) using the Standard AUC clas-sifier method. Genes that were not detected in>40%of cells and negativemarkerswere excluded.The top 10marker genes in each cluster, by power,were identified and used for deconvolution (n =107). BSeq-sc (v 1.0) (62) was then used to estimatethe proportion of cell types in each bulk sampleusing the default analysis pipeline.

REFERENCES AND NOTES

1. A. A. Kolodziejczyk, J. K. Kim, V. Svensson, J. C. Marioni,S. A. Teichmann, The technology and biology of single-cell RNAsequencing. Mol. Cell 58, 610–620 (2015). doi: 10.1016/j.molcel.2015.04.005; pmid: 26000846

2. X. Han et al., Mapping the Mouse Cell Atlas by Microwell-Seq.Cell 172, 1091–1107.e17 (2018). doi: 10.1016/j.cell.2018.02.001; pmid: 29474909

3. C. T. Fincher, O. Wurtzel, T. de Hoog, K. M. Kravarik,P. W. Reddien, Cell type transcriptome atlas for the planarianSchmidtea mediterranea. Science 360, eaaq1736 (2018).doi: 10.1126/science.aaq1736; pmid: 29674431

4. T. D. Otto et al., A comprehensive evaluation of rodent malariaparasite genomes and gene expression. BMC Biol. 12, 86(2014). doi: 10.1186/s12915-014-0086-0; pmid: 25359557

5. A. J. Reid et al., Single-cell RNA-seq reveals hiddentranscriptional variation in malaria parasites. eLife 7, e33105(2018). doi: 10.7554/eLife.33105; pmid: 29580379

6. L. McInnes, J. Healy, J. Melville, UMAP: Uniform ManifoldApproximation and Projection for Dimension Reduction.arXiv 1802.03426 [stat.ML] (6 December 2018).

7. K. K. Hanson, S. March, S. Ng, S. N. Bhatia, M. M. Mota, In vitroalterations do not reflect a requirement for host cell cycleprogression during Plasmodium liver stage infection.Eukaryot. Cell 14, 96–103 (2015). doi: 10.1128/EC.00166-14;pmid: 25416236

8. J. Shi, J. Malik, Normalized cuts and image segmentation.IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (2000).doi: 10.1109/34.868688

9. T. Kariu, T. Ishino, K. Yano, Y. Chinzei, M. Yuda, CelTOS, anovel malarial protein that mediates transmission to mosquitoand vertebrate hosts. Mol. Microbiol. 59, 1369–1379 (2006).doi: 10.1111/j.1365-2958.2005.05024.x; pmid: 16468982

10. M. J. Stewart, R. J. Nawrot, S. Schulman, J. P. Vanderberg,Plasmodium berghei sporozoite invasion is blocked in vitro bysporozoite-immobilizing antibodies. Infect. Immun. 51,859–864 (1986). pmid: 3512436

11. E. Bushell et al., Functional Profiling of a Plasmodium GenomeReveals an Abundance of Essential Genes. Cell 170, 260–272.e8 (2017). doi: 10.1016/j.cell.2017.06.030; pmid: 28708996

12. K. Modrzynska et al., A Knockout Screen of ApiAP2 GenesReveals Networks of Interacting Transcriptional RegulatorsControlling the Plasmodium Life Cycle. Cell Host Microbe 21,11–22 (2017). doi: 10.1016/j.chom.2016.12.003;pmid: 28081440

13. J. A. Young et al., In silico discovery of transcription regulatoryelements in Plasmodium falciparum. BMC Genomics 9, 70(2008). doi: 10.1186/1471-2164-9-70; pmid: 18257930

14. J. Guizetti, A. Scherf, Silence, activate, poise and switch!Mechanisms of antigenic variation in Plasmodium falciparum.Cell. Microbiol. 15, 718–726 (2013). doi: 10.1111/cmi.12115;pmid: 23351305

15. N. Rovira-Graells et al., Transcriptional variation in the malariaparasite Plasmodium falciparum. Genome Res. 22, 925–938(2012). doi: 10.1101/gr.129692.111; pmid: 22415456

16. T. Brugat et al., Antibody-independent mechanisms regulatethe establishment of chronic Plasmodium infection. Nat.Microbiol. 2, 16276 (2017). doi: 10.1038/nmicrobiol.2016.276;pmid: 28165471

17. G. X. Y. Zheng et al., Massively parallel digital transcriptionalprofiling of single cells. Nat. Commun. 8, 14049 (2017).doi: 10.1038/ncomms14049; pmid: 28091601

18. A. Butler, P. Hoffman, P. Smibert, E. Papalexi, R. Satija,Integrating single-cell transcriptomic data across differentconditions, technologies, and species. Nat. Biotechnol. 36,411–420 (2018). doi: 10.1038/nbt.4096; pmid: 29608179

19. V. Y. Kiselev, A. Yiu, M. Hemberg, scmap: Projection ofsingle-cell RNA-seq data across data sets. Nat. Methods 15,359–362 (2018). doi: 10.1038/nmeth.4644; pmid: 29608555

20. G. La Manno et al., RNA velocity of single cells. Nature 560,494–498 (2018). doi: 10.1038/s41586-018-0414-6;pmid: 30089906

21. H. J. Painter et al., Genome-wide real-time in vivotranscriptional dynamics during Plasmodium falciparumblood-stage development. Nat. Commun. 9, 2656 (2018).doi: 10.1038/s41467-018-04966-3; pmid: 29985403

22. R. Hoo et al., Integrated analysis of the Plasmodium speciestranscriptome. EBioMedicine 7, 255–266 (2016). doi: 10.1016/j.ebiom.2016.04.011; pmid: 27322479

23. L. M. Yeoh, C. D. Goodman, V. Mollard, G. I. McFadden,S. A. Ralph, Comparative transcriptomics of female and malegametocytes in Plasmodium berghei and the evolution of sex inalveolates. BMC Genomics 18, 734 (2017). doi: 10.1186/s12864-017-4100-0; pmid: 28923023

24. H. J. Lee et al., Integrated pathogen load and dualtranscriptome analysis of systemic host-pathogen interactionsin severe malaria. Sci. Transl. Med. 10, eaar3619 (2018).doi: 10.1126/scitranslmed.aar3619; pmid: 29950443

25. A. Poran et al., Single-cell RNA sequencing revealsa signature of sexual commitment in malaria parasites.Nature 551, 95–99 (2017). doi: 10.1038/nature24280;pmid: 29094698

26. U. Böhme et al., Complete avian malaria parasite genomesreveal features associated with lineage-specific evolution inbirds and mammals. Genome Res. 28, 547–560 (2018).doi: 10.1101/gr.218123.116; pmid: 29500236

27. V. M. Howick, A. J. Reid, vhowick/MalariaCellAtlas: Malaria CellAtlas (2019); http://dx.doi.org/10.5281/zenodo.2843883.

28. Malaria Cell Atlas; www.sanger.ac.uk/science/tools/mca/mca/.29. P.-C. Burda et al., A Plasmodium phospholipase is involved in

disruption of the liver stage parasitophorous vacuolemembrane. PLOS Pathog. 11, e1004760 (2015). doi: 10.1371/journal.ppat.1004760; pmid: 25786000

30. R. W. Moon et al., Adaptation of the genetically tractablemalaria pathogen Plasmodium knowlesi to continuous culturein human erythrocytes. Proc. Natl. Acad. Sci. U.S.A. 110,531–536 (2013). doi: 10.1073/pnas.1216457110;pmid: 23267069

31. A. R. Gomes et al., A genome-scale vector resource enableshigh-throughput reverse genetic screening in a malariaparasite. Cell Host Microbe 17, 404–413 (2015).doi: 10.1016/j.chom.2015.01.014; pmid: 25732065

32. Y. Liu et al., The conserved plant sterility gene HAP2 functionsafter attachment of fusogenic membranes in Chlamydomonasand Plasmodium gametes. Genes Dev. 22, 1051–1068 (2008).doi: 10.1101/gad.1656508; pmid: 18367645

33. R. Carter, D. H. Chen, Malaria transmission blocked byimmunisation with gametes of the malaria parasite. Nature263, 57–60 (1976). doi: 10.1038/263057a0; pmid: 986561

34. W. J. Martin, J. Finerty, A. Rosenthal, Isolation of Plasmodiumberghei (malaria) parasites by ammonium chloride lysis ofinfected erythrocytes. Nat. New Biol. 233, 260–261 (1971).doi: 10.1038/newbio233260a0; pmid: 5288121

35. I. Siden-Kiamos et al., Plasmodium berghei calcium-dependentprotein kinase 3 is required for ookinete gliding motility andmosquito midgut invasion. Mol. Microbiol. 60, 1355–1363(2006). doi: 10.1111/j.1365-2958.2006.05189.x;pmid: 16796674

36. I. Kozarewa et al., Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assemblyof (G+C)-biased genomes. Nat. Methods 6, 291–295 (2009).doi: 10.1038/nmeth.1311; pmid: 19287394

37. D. Kim, B. Langmead, S. L. Salzberg, HISAT: A fast splicedaligner with low memory requirements. Nat. Methods 12,357–360 (2015). doi: 10.1038/nmeth.3317; pmid: 25751142

38. A. Dobin et al., STAR: Ultrafast universal RNA-seq aligner.Bioinformatics 29, 15–21 (2013). doi: 10.1093/bioinformatics/bts635; pmid: 23104886

39. S. Anders, P. T. Pyl, W. Huber, HTSeq—A Python frameworkto work with high-throughput sequencing data. Bioinformatics31, 166–169 (2015). doi: 10.1093/bioinformatics/btu638;pmid: 25260700

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 10 of 11

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 12: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

40. M. D. Robinson, A. Oshlack, A scaling normalization methodfor differential expression analysis of RNA-seq data.Genome Biol. 11, R25 (2010). doi: 10.1186/gb-2010-11-3-r25;pmid: 20196867

41. C. S. McGinnis, L. M. Murrow, Z. J. Gartner, DoubletFinder:Doublet Detection in Single-Cell RNA Sequencing Data UsingArtificial Nearest Neighbors. Cell Syst. 8, 329–337.e4 (2019).doi: 10.1016/j.cels.2019.03.003; pmid: 30954475

42. V. Y. Kiselev et al., SC3: Consensus clustering of single-cellRNA-seq data. Nat. Med. 14, 483–486 (2017). doi: 10.1038/nmeth.4236; pmid: 28346451

43. J. D. Welch, A. J. Hartemink, J. F. Prins, SLICER: Inferringbranched, nonlinear cellular trajectories from single cellRNA-seq data. Genome Biol. 17, 106 (2016). doi: 10.1186/s13059-016-0975-3; pmid: 27215581

44. C. Trapnell et al., The dynamics and regulators of cell fatedecisions are revealed by pseudotemporal ordering of singlecells. Nat. Biotechnol. 32, 381–386 (2014). doi: 10.1038/nbt.2859; pmid: 24658644

45. F. Pedregosa et al., Scikit-learn: Machine Learning in Python.J. Mach. Learn. Res. 12, 2825–2830 (2011).

46. W. Saelens, R. Cannoodt, Y. Saeys, A comprehensiveevaluation of module detection methods for gene expressiondata. Nat. Commun. 9, 1090 (2018). doi: 10.1038/s41467-018-03424-4; pmid: 29545622

47. M. Bastian, S. Heymann, M. Jacomy, in Third International AAAIConference on Weblogs and Social Media (2009); www.aaai.org/ocs/index.php/ICWSM/09/paper/viewPaper/154.

48. M. Jacomy, T. Venturini, S. Heymann, M. Bastian, ForceAtlas2,a continuous graph layout algorithm for handy networkvisualization designed for the Gephi software. PLOS ONE 9,e98679 (2014). doi: 10.1371/journal.pone.0098679;pmid: 24914678

49. C. Aurrecoechea et al., PlasmoDB: A functional genomicdatabase for malaria parasites. Nucleic Acids Res. 37,D539–D543 (2009). doi: 10.1093/nar/gkn814; pmid: 18957442

50. A. T. L. Lun, D. J. McCarthy, J. C. Marioni, A step-by-stepworkflow for low-level analysis of single-cell RNA-seq data withBioconductor. F1000 Res. 5, 2122 (2016). pmid: 27909575

51. T. L. Bailey, DREME: Motif discovery in transcription factorChIP-seq data. Bioinformatics 27, 1653–1659 (2011).doi: 10.1093/bioinformatics/btr261; pmid: 21543442

52. T. L. Campbell, E. K. De Silva, K. L. Olszewski, O. Elemento,M. Llinás, Identification and genome-wide prediction of DNAbinding specificities for the ApiAP2 family of regulatorsfrom the malaria parasite. PLOS Pathog. 6, e1001165 (2010).doi: 10.1371/journal.ppat.1001165; pmid: 21060817

53. I. Kaneko, S. Iwanaga, T. Kato, I. Kobayashi, M. Yuda,Genome-Wide Identification of the Target Genes of AP2-O, aPlasmodium AP2-Family Transcription Factor. PLOS Pathog. 11,e1004905 (2015). doi: 10.1371/journal.ppat.1004905;pmid: 26018192

54. S. Gupta, J. A. Stamatoyannopoulos, T. L. Bailey, W. S. Noble,Quantifying similarity between motifs. Genome Biol. 8, R24(2007). doi: 10.1186/gb-2007-8-2-r24; pmid: 17324271

55. T. S. Andrews, M. Hemberg, M3Drop: Dropout-based featureselection for scRNASeq. Bioinformatics (2018). doi: 10.1093/bioinformatics/bty1044; pmid: 30590489

56. T. Carver, S. R. Harris, M. Berriman, J. Parkhill, J. A. McQuillan,Artemis: An integrated platform for visualization and analysisof high-throughput sequence-based experimental data.Bioinformatics 28, 464–469 (2012). doi: 10.1093/bioinformatics/btr703; pmid: 22199388

57. A. J. Enright, S. Van Dongen, C. A. Ouzounis, An efficientalgorithm for large-scale detection of protein families. NucleicAcids Res. 30, 1575–1584 (2002). doi: 10.1093/nar/30.7.1575;pmid: 11917018

58. R. C. Edgar, MUSCLE: A multiple sequence alignment methodwith reduced time and space complexity. BMC Bioinformatics5, 113 (2004). doi: 10.1186/1471-2105-5-113; pmid: 15318951

59. S. R. Eddy, Accelerated Profile HMM Searches. PLOS Comput.Biol. 7, e1002195 (2011). doi: 10.1371/journal.pcbi.1002195;pmid: 22039361

60. D. J. McCarthy, K. R. Campbell, A. T. L. Lun, Q. F. Wills, Scater:Pre-processing, quality control, normalization andvisualization of single-cell RNA-seq data in R. Bioinformatics33, 1179–1186 (2017). pmid: 28088763

61. L. Li, C. J. Stoeckert Jr., D. S. Roos, OrthoMCL: Identificationof ortholog groups for eukaryotic genomes. Genome Res.13, 2178–2189 (2003). doi: 10.1101/gr.1224503;pmid: 12952885

62. M. Baron et al., A Single-Cell Transcriptomic Map of the Humanand Mouse Pancreas Reveals Inter- and Intra-cell PopulationStructure. Cell Syst. 3, 346–360.e4 (2016). doi: 10.1016/j.cels.2016.08.011; pmid: 27667365

ACKNOWLEDGMENTS

We thank the staff of the Illumina Bespoke Sequencing and CoreCytometry teams at the Wellcome Sanger Institute for theircontribution, as well as J. Lee (QIMR Berghofer Medical ResearchInstitute) and A. Cunnington (Imperial College London) for theircontribution to the deconvolution of bulk RNA-seq data. Funding:The Wellcome Sanger Institute is funded by the Wellcome Trust(grant 206194/Z/17/Z), which supports M.K.N.L., M.H., O.B., M.B.,

and J.C.R. Also supported by MRC Career Development AwardG1100339 (M.K.N.L.); the Danish Institute of Advanced Study(D-IAS) and the University of Southern Denmark (SDU) (K.N.);Wellcome Trust grant 107372, the UK Department for InternationalDevelopment (DFID), the Swedish International DevelopmentCooperation Agency (Sida), the Swiss Agency for Development andCooperation (SDC), the German Academic Exchange Service(DAAD), and the Kenyan government (J.K.H. and H.B.); NIH/NIAIDgrant R01AI137154 (J.R. and L.V.); and Chan Zuckerberg Initiativegrant reference 183501 (T.A.). Author contributions: M.K.N.L. andA.M.T. conceived of and designed the study. Experiments weredesigned by V.M.H., A.J.C.R., M.K.N.L., and A.M.T. Parasite isolationoptimization was performed by V.M.H. (mosquito stages), A.C.J.R.(blood stages, field parasites), and A.M.T. (all stages). V.M.H. andA.C.J.R. performed cell sorting and library preparation for Smart-seq2 and 10x, respectively. V.M.H. performed and oversaw theSmart-seq2 data analysis. V.M.H., A.C.J.R., and T.A. performed the10x data analysis. H.H. performed the k-NN gene graph and spectralclustering analyses. T.A. performed the RNA velocity analyses.K.N. performed analyses on HeLa cells. A.J.R. performed analyseson bulk transcriptomes and pirs. L.H.V. and T.M. assisted with thelab work to produce P. knowlesi and P. berghei, respectively. O.B.,J.C.R., M.B., and M.H. provided supervisory support to T.M.,L.H.V., A.J.R., and T.A., respectively. M.H. also assisted with dataanalysis. H.B. and J.K.H. assisted with field work to captureP. falciparum and P. malariae from naturally infected donors. V.M.H.,A.J.C.R., A.M.T., and M.K.N.L. wrote the manuscript, withcontributions from other authors. Competing interests: H.H. waspreviously an employee of 10x Genomics and holds shares in thatcompany. Data and materials availability: All raw sequencing datahave been deposited in the European Nucleotide Archive atEuropean Molecular Biology Laboratory European BioinformaticsInstitute (www.ebi.ac.uk/ena/) under accession number ERP110344,and processed data are accessible via Github, and the Malaria CellAtlas website (28). Expression matrices and analysis code areavailable on Github and archived on Zenodo (27) (DOI: 10.5281/zenodo.2843883).

SUPPLEMENTARY MATERIALS

science.sciencemag.org/content/365/6455/eaaw2619/suppl/DC1Figs. S1 to S20Tables S1 and S2References (63–78)Data S1 to S4

22 January 2019; accepted 12 July 201910.1126/science.aaw2619

Howick et al., Science 365, eaaw2619 (2019) 23 August 2019 11 of 11

RESEARCH | RESEARCH ARTICLEon N

ovember 29, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 13: MALARIA The Malaria Cell Atlas: Single parasite ...€¦ · high-resolution mapping of developmental processes, cellular diversity, and cell-to-cell ... individuals.The Malaria Cell

cycle lifePlasmodiumThe Malaria Cell Atlas: Single parasite transcriptomes across the complete

M. Talman and Mara K. N. LawniczakTom Metcalf, Lisa H. Verzier, Julian C. Rayner, Matthew Berriman, Jeremy K. Herren, Oliver Billker, Martin Hemberg, Arthur Virginia M. Howick, Andrew J. C. Russell, Tallulah Andrews, Haynes Heaton, Adam J. Reid, Kedar Natarajan, Hellen Butungi,

DOI: 10.1126/science.aaw2619 (6455), eaaw2619.365Science 

, this issue p. eaaw2619; see also p. 753Sciencefrom patient isolates to be deconvoluted and for classification of parasitemia according to developmental stage.

transcriptomesP. malariae and P. falciparumsubstructure of established life-cycle stages. The atlas also allowed for genes. These clusters enabled functional assignment of hypothetical and conserved genes, and they hint at further

identified 20 ''modules'' among 5156 core transcriptomeP. bergheilife-cycle stages of the rodent-model malaria parasite malaria parasite life cycle (see the Perspective by Winzeler). Single-cell transcriptomes generated from 10 different

present a single-cell analysis of transcription across theet al.efforts, a huge global disease burden remains. Howick cause human malarial diseases, and, despite determined controlPlasmodiumSeveral species of the parasite

Mapping the malaria parasite

ARTICLE TOOLS http://science.sciencemag.org/content/365/6455/eaaw2619

MATERIALSSUPPLEMENTARY http://science.sciencemag.org/content/suppl/2019/08/21/365.6455.eaaw2619.DC1

CONTENTRELATED

http://stm.sciencemag.org/content/scitransmed/11/473/eaan4479.fullhttp://stm.sciencemag.org/content/scitransmed/10/447/eaar3619.fullhttp://science.sciencemag.org/content/sci/365/6455/753.full

REFERENCES

http://science.sciencemag.org/content/365/6455/eaaw2619#BIBLThis article cites 74 articles, 10 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

is a registered trademark of AAAS.ScienceScience, 1200 New York Avenue NW, Washington, DC 20005. The title (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement ofScience

Science. No claim to original U.S. Government WorksCopyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of

on Novem

ber 29, 2020

http://science.sciencemag.org/

Dow

nloaded from