Pattern Recognition in Clinical Data
-
Upload
saket-choudhary -
Category
Technology
-
view
282 -
download
2
Transcript of Pattern Recognition in Clinical Data
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
Pattern Recognition In Clinical Data
Saket ChoudharyDual Degree Project
Guide: Prof. Santosh Noronha
C G C A T C G A G C T
C G C G T C G A G C T
October 30, 2013
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
INTRODUCTION
INTRODUCTION
Objective
SIGNIFICANT MUTATIONS
Next Generation SequencingMotivationComputational Methods for Driver Detection
VIRAL GENOME DETECION
Workflow
REPRODUCIBILITY
Reproducibility
CONCLUSIONS
Wrapping up
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
OBJECTIVE
Next GenerationSequencing
&Cancer Research
Toolsfor
BiologistsReproducible
analysis
Driver& Pas-senger
MutationDetection
GalaxyTools
ViralGenome
Inte-gration
GalaxyWork-flow
Better/NewAlgorithms
Bench-marking
Alignmenttools
BWAv/s
BWA-PSSM
Driver& Pas-senger
MutationDetection
EnsemblMethod
Lit.Survey
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NEXT GENERATION SEQUENCING
C G C G T C G A G C T A G C A
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NEXT GENERATION SEQUENCING
C G C G T C G A G C T A G C A
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NEXT GENERATION SEQUENCING
C G C G T C G A G C T A G C A
G C G T∗ C G A G∗ C T A G∗
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NEXT GENERATION SEQUENCING
C G C G T C G A G C T A G C A
G C G T∗ C G A G∗ C T A G∗
A G C G C G T C G A G C T A G C A C A
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NGS: WHY?
I Molecular Approach: Study of variations at the ’base’level
I Low Cost: 1000$ genomeI Faster: Quicker than traditional sequencing techniques
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NGS: WHY?
I Molecular Approach: Study of variations at the ’base’level
I Low Cost: 1000$ genomeI Faster: Quicker than traditional sequencing techniques
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NGS: WHY?
I Molecular Approach: Study of variations at the ’base’level
I Low Cost: 1000$ genomeI Faster: Quicker than traditional sequencing techniques
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NGS: WHERE?
I Study variations, genotype-phenotype associationI Look for ’markers of diseases’I Prognosis
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NGS: WHERE?
I Study variations, genotype-phenotype associationI Look for ’markers of diseases’I Prognosis
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NGS: WHERE?
I Study variations, genotype-phenotype associationI Look for ’markers of diseases’I Prognosis
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
NGS: MUTATIONS
I 3x109 base pairsI We are all 99.9% similar, at DNA levelI More than 2 million SNPsI No particular pattern of SNPsI If a certain mutation causes a change in an amino acid, it is
referred to as non synonymous(nsSNV)
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVERS AND PASSENGERS I
Cancer is known to arise due to mutationsNot all mutations are equally important!
Somatic MutationsSet of mutations acquired after zygote formation, over andabove the germline mutations
Driver MutationsMutations that confer growth advantages to the cell, beingselected positively in the tumor tissue
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVERS AND PASSENGERS
Drivers are NOT simply loss of function mutations, but morethan that:
I Loss of function: Inactivate tumor suppressor proteinsI Gain of function: Activates normal genes transforming
them to oncogenesI Drug Resistance Mutations: Mutations that have evolved
to overcome the inhibitory effect of drugs
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVERS AND PASSENGERS
Drivers are NOT simply loss of function mutations, but morethan that:
I Loss of function: Inactivate tumor suppressor proteinsI Gain of function: Activates normal genes transforming
them to oncogenesI Drug Resistance Mutations: Mutations that have evolved
to overcome the inhibitory effect of drugs
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVERS AND PASSENGERS
Drivers are NOT simply loss of function mutations, but morethan that:
I Loss of function: Inactivate tumor suppressor proteinsI Gain of function: Activates normal genes transforming
them to oncogenesI Drug Resistance Mutations: Mutations that have evolved
to overcome the inhibitory effect of drugs
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVER MUTATIONS: WHY?
Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis
I Low cost of NGS comes with a heavier roadblock of dataanalysis
I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem
I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial
I However there is no tool that allows one to visualise theresults on an input across the cohort of tools
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVER MUTATIONS: WHY?
Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis
I Low cost of NGS comes with a heavier roadblock of dataanalysis
I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem
I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial
I However there is no tool that allows one to visualise theresults on an input across the cohort of tools
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVER MUTATIONS: WHY?
Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis
I Low cost of NGS comes with a heavier roadblock of dataanalysis
I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem
I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial
I However there is no tool that allows one to visualise theresults on an input across the cohort of tools
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVER MUTATIONS: WHY?
Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis
I Low cost of NGS comes with a heavier roadblock of dataanalysis
I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem
I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial
I However there is no tool that allows one to visualise theresults on an input across the cohort of tools
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
MACHINE LEARNING ITwo datasets:
I Training: Labeled dataset, containing a table of featureswith mutations labelled as ”drivers/passengers”
I Test: ’Learning’ from training dataset, test the predictionmodel
Table: Training Dataset
Chromosome Position Ref Alt Type1 27822 A G Driver1 27832 T G Driver2 47842 G C Passenger. . . . .. . . . .. . . . .
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
MACHINE LEARNING II
Table: Test Dataset
Chromosome Position Ref Alt Type1 27824 A G ?1 47832 T G ?
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
MACHINE LEARNING: FEATURE SELECTION I
Machine Learning relies on a set of features for trainingRedundant features should be avoidedCHASM [1] makes use ofp(Xi) represents the probability of occurrence of an event XiConsidering a series of events X1, X2, X3...,Xn analogous ’seriesof packets’ in communication theory , the information receivedat each step can be quantified on a log scale by:
1log2(Xi)
= −log2(p(Xi)) (1)
The expected value of information from a series of events iscalled shannon entropy: H(X):
H(X) = −∑
i
p(Xi) log2 p(Xi) (2)
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
MACHINE LEARNING: FEATURE SELECTION IIMutual Information between two random variables X,Y isdefined as the amount of information gained about randomvariable X due to additional information gained from thesecond, Y:
I(X,Y) = H(X)−H(X|Y) (3)
Here:X: Class Label[Driver/Passenger]Y: Predictive Featureand hence I(X,Y) represents how much information wasgained about the class label Y from knowledge of a feature XSimplifying :
I(X,Y) =∑
p(x, y)log2p(x, y)
p(x)p(y)(4)
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
FUNCTIONAL IMPACT I
I If a certain mutation confers an advantage to the cell interms of replication rate, it is probably going to be selectedwhile all those mutations that reduce its fitness have ahigher chance of being eliminated from the population.
I Certain residues in a MSA of homologous sequences aremore conserved than others. A highly conserved ifmutated is possibly going to cost a lot since what had’evolved’ is disturbed!
I Scores can be assigned based on this ”conservation”parameter.
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
FUNCTIONAL IMPACT IIFigure: SIFT [?] algorithm
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
Some of the common tools/algorithms used for drivermutation prediction:
I SIFTI PolyphenI Mutation AssesorI TransFICI Condel
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
FRAMEWORK FOR COMPARING VARIOUS TOOLS I
I Different tools use different formats, give different outputsfor similar input
I Running analysis on multiple tools −→ keep shifting dataformats
I Concordance?
Polyphen2 Input
chr1:888659 T/Cchr1:1120431 G/Achr1:1387764 G/Achr1:1421991 G/Achr1:1599812 C/Tchr1:1888193 C/Achr1:1900186 T/C
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
FRAMEWORK FOR COMPARING VARIOUS TOOLS II
SIFT Input
1,888659,T,C1,1120431,G,A1,1387764,G,A1,1421991,G,A1,1599812,C,T1,1888193,C,A1,1900186,T,C
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
DRIVER MUTATIONS: TOOLS DON’T AGREE
X Axis: Condel Score Y Axis: MA Score
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
Solution?:Galaxy[?], an open source web-based platform forbioinformatics, makes it possible to represent the entire dataanalysis pipeline in an intuitive graphical interface
Figure: Galaxy Workflow polyphen2 algorithm
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
Run all tools in one go:
Figure: Run all tools
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
Compare all tools:
Figure: Compare all tools
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
VIRAL GENOME DETECION
Cervical cancers have been proven to be associated withHuman Papillomavirus(HPV)Cervical cancer datasets from Indian women was put throughan analysis to detect :
1. Any possible HPV integration2. Sites of HPV integration
Who Cares?I Replacing whole genome sequencing, by targeted
sequencing at the sites where these virus have beendetected in a cohort of samples, thus speeding up thewhole process.
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
RawData
Align datato humangenome
reference
Extractunmapped
regions
Alignunmapped
regionsto Virusgenome
ExtractmappedregionsBLAST
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
RawData
Align datato humangenome
reference
Extractunmapped
regions
Alignunmapped
regionsto Virusgenome
ExtractmappedregionsBLAST
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
RawData
Align datato humangenome
reference
Extractunmapped
regions
Alignunmapped
regionsto Virusgenome
ExtractmappedregionsBLAST
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
RawData
Align datato humangenome
reference
Extractunmapped
regions
Alignunmapped
regionsto Virusgenome
ExtractmappedregionsBLAST
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
RawData
Align datato humangenome
reference
Extractunmapped
regions
Alignunmapped
regionsto Virusgenome
ExtractmappedregionsBLAST
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
RawData
Align datato humangenome
reference
Extractunmapped
regions
Alignunmapped
regionsto Virusgenome
ExtractmappedregionsBLAST
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
RawData
Align datato humangenome
reference
Extractunmapped
regions
Alignunmapped
regionsto Virusgenome
ExtractmappedregionsBLAST
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
GALAXY WORKFLOW
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
Figure: Aligned HPV genomes
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
REPRODUCIBILITY
I In pursuit of novel ’discovery’, standardizing the dataanalysis pipeline is often ignored, leading to dubiousconclusions
I Analysis should be reproducible and above all, correctI Parameter’s values can change the results by a big factor,
they need to be documented/loggedI Garbage in, Garbage out
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
CONCLUSIONS
With the Galaxy tool box for identification of significantmutations and the study of the science behind the methods, thenext steps would be to:
I Open source the toolbox to the community: A tool makeslittle sense if it is not in a usable form, communityfeedback will be used to add more tools and improve theexisting ones
I A new method for driver mutation prediction: all themethods have low level of concordance. A new methodthat takes into account the available data at all levels :mutations, transcriptome and micro array data is possible.With the Galaxy toolbox in place, it would be possible tointegrate information at various levels
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
FUTURE WORK
I Develop an algorithm that integrates machine learningapproach with functional approach by zeroing down upononly those attributes that are known to have an impact
I The algorithm would also account for information at otherlevels: RNA expressions, Clinical data.
I Integrating information at all levels would provide adeeper insight
I The developed Galaxy toolbox will be used a the basicframework for integrating information
INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS
REFERENCES I
Hannah Carter, Sining Chen, Leyla Isik, SvitlanaTyekucheva, Victor E Velculescu, Kenneth W Kinzler, BertVogelstein, and Rachel Karchin.Cancer-specific high-throughput annotation of somaticmutations: computational prediction of driver missensemutations.Cancer research, 69(16):6660–6667, 2009.