Detection of structural variants and copy number alterations in cancer: from computational...
-
Upload
leo-shields -
Category
Documents
-
view
217 -
download
1
Transcript of Detection of structural variants and copy number alterations in cancer: from computational...
Detection of structural variants and copy number alterations in cancer: from computational strategies to the discovery of chromothripsis in neuroblastoma
Introduction
CNA & LOH detection (FREEC)
Discovery of chromothripsis in neuroblastoma
• Detection of CNA regions• Detection of LOH regions• Possibility to work without control sample• Possibility to set tumor ploidy• Automatic window selection• Use of mappability information• Evaluation of and adjustment of contamination of tumor samples
by normal cells• Possibility to work with exome data• Possibility to cross the output with the output of SVDetect
• Detection of CNA regions• Detection of LOH regions• Possibility to work without control sample• Possibility to set tumor ploidy• Automatic window selection• Use of mappability information• Evaluation of and adjustment of contamination of tumor samples
by normal cells• Possibility to work with exome data• Possibility to cross the output with the output of SVDetect
1 Inserm U900, 75248 Paris, France 2 Mines ParisTech, Fontainebleau, F-77300 France3 Institut Curie, 26, rue d’Ulm, 75248 Paris, France 4 Inserm U830, 75248 Paris, France
To find a best fit by polynomial, shown in black (A-D), we first make an initialization of the polynomial's parameters (median value of RC for GC-content). Then, we optimize polynomial’s parameters by iteratively selecting data points related to P-copy regions and making a least-squares fit on them.
In many studies that apply deep sequencing to cancer genomes, one has to calculate copy number profiles (CNPs) and predict regions of gain and loss. There exist two frequent obstacles in the analysis of cancer genomes: absence of an appropriate control sample for normal tissue and possible polyploidy. We therefore developed Control-FREEC1,2, able to automatically detect Copy Number Alterations (CNAs) with or without use of a control dataset and Loss of Heterozygosity (LOH) regions.For mate-paired/paired-ends mapping (PEM) data, one can complement the information about CNAs (i.e., output of Control-FREEC) with the predictions of Structural Variants (SVs) made by another tool that we developed, SVDetect3. Here we used a combination of Control-FREEC and SVDetect (http://bioinfo-out.curie.fr/projects/freec/sv.html) on neuroblastoma samples to (1) refine coordinates of CNAs using PEM data and (2) improve confidence in calling true positive rearrangements (particularly, in ambiguous satellite/repetitive regions).
For mate-paired/paired-ends mapping (PEM) data, one can complement the information about copy number changes (i.e., output of FREEC) with the predictions of structural variants (SVs) made by SVDetect3. Automatic intersection of Control-FREEC and SVDetect outputs allows one to:
1 Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Boeva, V., et al. Bioinformatics, 2011; 27(2):268-9. http://bioinfo-out.curie.fr/projects/freec/2 Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. V. Boeva, et al. Bioinformatics, 2012, 28(3):423-5.3 SVDetect - a bioinformatic tool to identify genomic structural variations from paired-end next-generation sequencing data. B. Zeitouni et al., Bioinformatics, 2010. 26: 1895-1896. http://svdetect.sourceforge.net
Window size selection Calculation of dependency function “RC vs GC-content” or “RC sample vs RC control”
W = L/T/(CV)2, where L = genome length, T = total number of reads, CV = user-defined Coefficient of Variation.
• Refine coordinates of CNAs using PEMs • Filter out false predictions of SVDetect (often in ambiguous satellite/repetitive regions)
Valentina Boeva1,2,3, Bruno Zeitouni1,2,3, Tatiana Popova1,2,3, Kevin Bleakley1,2,3, Andrei Zinovyev1,2,3, Jean-Philippe Vert1,2,3, Isabelle Janoueix-Lerosey3,4, Olivier Delattre3,4 and Emmanuel Barillot1,2,3 E-mail: [email protected]
SegmentationSegmentation is done by a LASSO-based algorithm suggested by (Harchaoui and Lévy-Leduc, 2008).
Adjustment for a possible contamination by normal cells
Control-FREEC uses the following formula to evaluate the fraction of contaminating normal cells p, and then correct copy number profiles:
NRCi ≈ Ei + (1 - Ei)p,
where NRCi is the normalized read count in window i, Ei is the expected ratio in window i .
1. List of gains and losses with assigned copy numbers2. Visualization in R
3. Creation of different file format outputs for graphical visualization: Circos, UCSC Genome Browser (BedGraph)
Results and graphical visualization
SVDetect3 is a tool that allows the user to:•identify candidate SVs using the clustering of discordant PEMs,•predict the type of a SV using the PEM signature, •Filter out PEMs inconsistent with the main signature of the predicted SV,•Compare SVs predicted for different samples•Create different file format outputs for graphical visualization of predicted SVs
Illustrations of read signatures for SV type prediction (implemented in SVDetect3)
Intra-chromosomal SVs Inter-chromosomal SVs
Circos representation of SVs predicted by SVDetect confirmed by the CNAs identified by Control-FREEC. (A-C) NB1141, (D-E) NB1142. (A,D) whole genome view, (B, E) zoom on chromothripsis, (C, F) copy number profile for chr1 of NB1141 and chr6 of NB1142.
F G
Calculation of BAF profiles
Nor
mal
ized
Co
py N
umbe
r B
alle
le
freq
uenc
y
Annotation of B allele frequency profiles using Gaussian mixture model fit
Primary neuroblastoma tumors with chromothripsis
Neuroblastoma cell lines
CLB-GA
CLB-RE
Detection of SVs (SVDetect)
We investigated somatic rearrangements in two neuroblastoma cell lines and two primary tumors using paired-end sequencing of mate-pair libraries