Inferring clonal composition of a breast cancer from multiple tissue samples Habil Zare Department...
-
Upload
loraine-stokes -
Category
Documents
-
view
215 -
download
1
Transcript of Inferring clonal composition of a breast cancer from multiple tissue samples Habil Zare Department...
1
Inferring clonal composition of a breast cancer
from multiple tissue samples
Habil ZareDepartment of Genome Sciences
University of Washington19 Dec 2013
2
Hypothesis
Because cancer is a heterogeneous disease, synergistic medications can
treat it better than a single drug.
3
Traditional concept of a tumor
Schematic figure
4
Most tumors are heterogeneous
Schematic figure
5
Different clones have different genotypes and phenotypes
Clone 1
Clone 2
Clone 3Clone 4Clone 5
Clone 6
Schematic figure
6
It is important to identify the clonal composition
Treatment A
Treatment B
Relapse
Relapse
7
It is important to identify the clonal composition
?
8
It is important to identify the clonal composition
?
9
Our approach to analyze multiple samples from a single tumor
10
Our approach to analyze multiple samples from a single tumor
Each sample has different information about the clonal composition
PCR
PCR
PCR
Next Gen Sequencing
Next Gen Sequencing
Next Gen Sequencing
Counting the number of reads which support each mutation
A closer look at the Next-Gen Sequencing output
• At each locus, 2 integers are provided: total number of analyzed reads, andthe number of reads supporting the mutation.
• Because different clones have different contributions to each sample, these numbers vary across the samples.
How to use this variation to infer the clonal composition?
13
The observations
The observations boils down to the number of reads which support each allele.
• M samples• Mutations on N loci
Building a generative model
Tumor
14
Building a generative modelGiven the parameters, how to generate data?
Data
Parameters
Generate
15
Data
Parameters
Generate
?
Building a generative modelGiven the parameters, how to generate data?
16
Generate
Building a generative model
Parameters?
17
The main assumption on the distribution of reads
Mutation i can be present or absent in each clone
Project on Mutation i
Building a generative model
Assumption: Reads are analyzed uniformly at random => Binomially distributed
18
The main assumption on the distribution of reads
Mutation i can be present or absent in each clone
Project on Mutation i
Number of reads exhibiting the variant allele at locus i in sample j.
Total number of reads
Frequency of variant allele
Assumption
Building a generative model
19
A close look at the binomial distribution
Total number of readsObserved
Frequency of variant allele ?
Number of reads exhibiting the variant
Observed
depends on:1. Which clones contain mutation i ?2. What is the frequency of those clones in sample j ?
Building a generative model
20
Introducing the hidden variables
If Zi,c = 1, clone c has a variant allele at locus i. depends on:1. Which clones contain mutation i ?2. What is the frequency of those clones in sample j ?
Building a generative model
21
Notation for the model parameters
depends on:1. Which clones contain mutation i ?2. What is the frequency of those clones in sample j ?
Building a generative model
22
Building a generative model
ParametersC
Generate
?
23
The assumptions
• Each mutation can occur at a locus independently at random.• The samples are independent from each other.
Building a generative model
24
Building a generative model
ParametersC
Generate
Technical
25
Overview of the generative model from parameters to the observations
C
Parameters
Observations
26
InferenceGiven the observed counts, how do we infer the clonal structure?
C
Inference
Technical
EM
27
We infer model parameters using expectation-maximization
Details omitted
Derived from the binomial distribution
Derived from Bernoulli distribution
28
How can we evaluate whether the model works?
Inference
Two rounds of next gen sequencing
C
29
We do not know the reality
~Inferred Reality
30
Generating synthetic data
Inference
C`
Generate
31
Inference
C
Generate
Generating synthetic data
Random parameters
compare
32
• Genotype error: The frequency of false entries in the genotype matrix Z
• Clone frequency error: The average error in entries of the frequency matrix P
Defining accuracy criteria
33
Simulation shows genotype error decreasing with increasing samples
34
Simulation shows genotype error decreasing with increasing samples
35
Clone frequency error shows a similar trend
36
M1
P1
P3
P2
Experiment with real dataStudy on a primary breast cancer
• 10 breast tumor samples• 1 adjacent normal • 2 samples from the
metastatic lymph node
37
Clone frequencies vary smoothly across the tumor sections
The model doesn’t know anything about the anatomic location of the samples!
38
Clone frequencies vary smoothly across the tumor sections
39
Phylogenetic analysis tells the story of the tumor over time
40
Five clone solution
41
Six clone solution is consistent with five-
clone solution
42
Next-Gen Sequencing Data
Oncologists
Clonal structureEM Validated by simulations
Anatomic variation of clones Phylogenetic trees
Overview of the projectInferring clonal composition of a breast cancer from multiple tissue samples
43
Software publicly available
44
Supplementary slides
Proposed project based on former experiences:Identifying clonal decomposition using sub-tissues
SamSPECTRAL
Sort cell populations
Next Gen Sequencing
Next Gen Sequencing
Next Gen Sequencing
Next Gen Sequencing
Leukemia or lymphoma sample
Clonalanalysis