Post on 20-Dec-2015
Integrative Analysis of Integrative Analysis of Biological DataBiological Data
Sai MoturuSai Moturu
MAGICMAGIC
Multisource Association of Genes by Integration of Clusters
Goal: Integrate heterogeneous types of high-throughput data for accurate gene function prediction
Bayesian reasoning Incorporates expert knowledge Yeast Data
Integrative analysis ! Why ??Integrative analysis ! Why ??
High throughput methods sacrifice High throughput methods sacrifice specificity for scalespecificity for scale
Microarray data alone is good for Microarray data alone is good for hypothesis generation but lacks hypothesis generation but lacks specificity for accurate gene function specificity for accurate gene function predictionprediction
By using heterogeneous functional By using heterogeneous functional data, the prediction accuracy is data, the prediction accuracy is improvedimproved
Need for MAGICNeed for MAGIC
Studies have combined different Studies have combined different types of data in a heuristic fashion on types of data in a heuristic fashion on a case by case basisa case by case basis
No general scheme or probabilistic No general scheme or probabilistic representation is appliedrepresentation is applied
Methods for combination of specific Methods for combination of specific datadata
MAGIC – general method to integrate MAGIC – general method to integrate disparate data sourcesdisparate data sources
Input to MAGICInput to MAGIC
Input: Gene-Gene relation matrices for Input: Gene-Gene relation matrices for each data sourceeach data source
The elements of the matrix are scores The elements of the matrix are scores that indicate whether there could be that indicate whether there could be relationship between two genesrelationship between two genes
The score can be binary, discrete or The score can be binary, discrete or continuouscontinuous
Input format is flexible and allows Input format is flexible and allows genes to be in more than one group or genes to be in more than one group or clustercluster
Thus does not exclude biclustering or Thus does not exclude biclustering or fuzzy clustering methodsfuzzy clustering methods
Structure of the MAGIC Structure of the MAGIC Bayesian networkBayesian network
Prior probabilities assessed by Prior probabilities assessed by expertsexperts
EvaluationEvaluation
No gold standard for gene groupings No gold standard for gene groupings existsexists
GO is the best available reflection of GO is the best available reflection of current biological knowledgecurrent biological knowledge
Use a cutoff of 3 levels in the Use a cutoff of 3 levels in the hierarchical structure to say that to hierarchical structure to say that to genes are functionally relatedgenes are functionally related
ResultsResults
ResultsResults
AVIDAVID
Annotation Via Integration of Data
Integrates data to build high-Integrates data to build high-confidence networks in which proteins confidence networks in which proteins are connected if they are likely to are connected if they are likely to share a common annotationshare a common annotation
AVID predictions functional AVID predictions functional annotation in all three GO categoriesannotation in all three GO categories
AVID stagesAVID stages
AVID resultsAVID results
AVID resultsAVID results