Normal/Tumor somatic mutations report tool

Post on 30-Jun-2015

236 views 1 download

description

Presentation used for my oral Master's Thesis defense for the Universtat Autònoma de Barcelona. It shows the development of a Perl script for the automated generation of a report of the somatic mutations found in a Normal/Tumor cancer experiment.

Transcript of Normal/Tumor somatic mutations report tool

Development of a bioinformatics tool for the automated generation of a report of the somatic mutations found

in a Normal/Tumor cancer experiment

Isaac Noguera Guixà

Universitat Autònoma de Barcelona15th of July, 2014

Project tutor:

Dr. Raúl TondaData analysis team. Centre Nacional d‘Anàlisi Genòmica (CNAG), PCB

Academic tutor: Dr. Miguel Perez-Enciso. Centre for Research in Agricultural Genomics (CRAG), UAB

Course 2013 - 2014

Master’s Thesis

2

Table of contents

Introduction◦ Cancer genetics

◦ Cancer in Bioinformatics

Objectives

Material and methods

Results

Conclusions

3

Introduction

Loss of normal growth control

Cell damage (no repair)Normal cell

Cell suicide (apoptosis)

Uncontrolled growth

1st mutation

2nd mutation 3rd mutation

Yulug, I. (2006). Molecular basis of cancer [PowerPoint slides]. Retrieved from http://www.hugointernational.org/resources/Isik_Yulug_Molecular_Basis_of_cancer_bilingual.ppt

4

Introduction

Cancer in Bioinformatics

Normal sample

Tumor sample

Read mapping and

variant calling

Normal/Tumor experiment

Lopez-Bigas, N. (2011). Identification of cancer drivers across tumor types [PowerPoint slides]. Retrieved from http://es.slideshare.net/nurialopezbigas/identification-of-cancer-drivers-across-tumor-types#

A variant is determined by the joint status in tumor-normal sequence pairs

5

Variant call format (vcf)

Introduction

Cancer in Bioinformatics

Normal/Tumor experiment

(Danecek, P. et al., 2011)

6

Objectives

Main objective

Develop an automated tool to produce a report of the somatic variants found in a Normal/Tumor experiment

→ Process the output of the CNAG’s variant calling pipeline

→ Filter the somatic variants from it and extract relevant statistics from them

→ Identify those variants that are already known and annotated in cancer somatic mutations databases

→ Transform the obtained data into some tables and graphics to include in the report

→ Fill a report template independently from the code of the main script with the processed data

→ Generate the report document in printable format such as a portable document format (pdf)

→ Execute all these steps sequentially and automatically

Additional objective

Incorporate the developed tool as an additional step in the variant calling pipeline from the CNAG’s Data Analysis team

7

Material and methods

Basis of the developed tool:

Main script Template document

Perl script

Template module

Input data processing

Output data generation

Template Toolkit script

LaTeX code with R and Template Toolkit

code embedded

8

Material and methods

Template Toolkit document

Noweb document

CNAG’s vcf

Data processing

COSMICdb annotation

Somatic variants filtering

Output data storing/generation

Template processing

Template processing

R Sweave

LaTeX document

pdflatexPdf

document

Inputdata

Designed pipeline:

##INFO=<ID=FP,Number=1,Type=Float,Description="Fisher test P-value for somatic comparison.">#CHROM POS ID REF ALT QUAL FILTER FORMAT INFO NORMAL TUMORChr1 883814 . A G 18.1 mrd10 DP=36;UPSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000496938|);FP=0.00604 GT:PL:DP 0/0:0,96,255:32 0/1:51,0,26:3Chr20 126154 dbSNPBuildID=137;GMAF=0.1648 T A 64.7 mrp0.05 INDEL;EFF=FRAME_SHIFT(HIGH||||DEFB126|protein_coding|CODING|ENST00000382398|exon_20_126056_126392;FP=1 GT:PL:DP 1/1:255,255,0:274 0/1:253,0,45:26

9

Results

Script's usage description...

usage: main.pl -f file [-template file] [-p value] [-s value] [-project "string"] [-cnv "string "] [-methods] [-cosmic file] [-h]

- h this (help) message

- f file variant call format file (.vcf) to be analyzed

- template file template Toolkit file (.tt) to be used as a template. If not defined, it will use the default (“reporttemplate.tt”)

- p valueadd extra p-values to the default p-values (1,0.05 and 0.001) that will be used for the somatic variants filtering

- s valuesomatic variants will be only filtered for the specified p-values defined by this option

- cosmic fileCOSMIC database file for SNPSift annotation (default “CosmicCodingMuts_v68")

- cnv "string"specify the path where the script will look for the Control-FREEC output. If it is found, it will be added to the report

- project "string"add the name of the project to the report title page

- methodsprint the methods appendix in the report (if not defined it will be not printed)

10

Results

Adobe Acrobat Document

$ perl main.pl –f PatientX.vcf –s ‘1,0.001’ –cnv “/Project/Production/DAT/CNV/” –project “FAMCOLON” –methods

11

Conclusions

1) We developed a functional automated tool which automatically generates a report document for the somatic variants found in a Normal/Tumor experiment.

2) The content of the report is acceptable but it can be improved.

3) The tool has been successfully tested. It also has already been implemented within CNAG’s variant calling pipeline to be run as its last step.

4) The template document is independent from the main script. It, in addition to the set of configurable parameters from the main script, makes the tool really customizable.

5) Not limited by the use of computational resources. The execution time and memory usage required by the tool seems not to be a limiting factor for its usage.

Tool's last aim Make easier the transfer of information from the basic research to the clinical diagnostic .

12

Thank you for your attention