Visualization and Analysis Workflow
description
Transcript of Visualization and Analysis Workflow
Visualization and Analysis Workflow
December 14, 2009 Draft
/ber
The concept of a Workflow
• Express the analysis of plant systems in terms of the data and operations on those data– Multiple types of data (e.g., experimental,
computed, archival) – Mutliple types of operations (e.g., analytical,
visualization, search)
• Treat the data and operations as components, which can be re-used, replaced, augmented, and extended.
Workflow• A pathway of operations• Entities:
– Operation – Data
– Flow
• The flow through the operations is managed by the workflow software (e.g., VizTrails)
Multi-layer workflows
List of genesList of genes
Co-expression analysis
Co-expression analysis
NetworkNetwork
Conceptual Level: High-level representation for casual users, with lots of defaults pre-selected
Professional Level: Visibility into underlying workflows, with freedom to select tools and parameters
List of genesList of genes NetworkNetwork
Analysis of omics data
Infrastructure Level: The explicit treatment of underlying data, databases, data integration, tools, operations, parameters, defaults, wrappers, provenance, interconnectivity, access, etc.
Statistical analysis tool
Statistical analysis tool
Interactive Visual
Analysis
Interactive Visual
Analysis
MetabolitesMetabolites
PathwaysPathways
VizTrails- a candidate workflow architecture
• Visual programming interface for representing data and operations as workflows
• Loose coupling, using parameterizable Python wrappers• Extensible, flexible, re-usable components and workflows • Coupled with an attractive, flexible User Interface (to be
developed)
Conceptual workflow
Professional workflow Provenance
and metadata
Interactive visualizations
Example Workflows from iPLANT team
• Goals: – Demonstrate the use of a workflow model for
representing the data and processes in plant genomic research exploration
– Provide a common structure for iPLANT use cases– Help define requirements for data integration– Motivate discussions about analysis that join multiple
types of data, allow users to interact dynamically, and provide interactive painting across visual representations (e.g., painting a metabolic pathway with gene expression magnitude)
Workflow for Maize Gene Analysis
List of 20 homogolous maize gene IDs
List of 20 homogolous maize gene IDs
Find expression values for these genes (e.g, Next
Gen)
Find expression values for these genes (e.g, Next
Gen)
For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome
Browser)
For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome
Browser)
5 genes of interest 5 genes of interest
List of homologous Arabadopsis gene IDs
List of homologous Arabadopsis gene IDs
Modeling and Statistical Inference
Modeling and Statistical Inference
Literature searchLiterature search
Homolog Finder (e.g, CoGE)
Homolog Finder (e.g, CoGE)
Candidate maize gene
Candidate maize gene
Co-Expression Analysis (e.g., ATTED2)
Co-Expression Analysis (e.g., ATTED2)
Expression Network of 10 Arabidopsis Genes Expression Network of 10 Arabidopsis Genes
Homolog Finder (e.g, CoGE)
Homolog Finder (e.g, CoGE)
Expression data for 20 maize genes
Expression data for 20 maize genes
/tb/ber
Examine clusters that can handle maize data
(e.g., eNorthern, MapMan)
Examine clusters that can handle maize data
(e.g., eNorthern, MapMan)
note: very limited data for maize so may need to go to rice
iterate
Workflow for Analysis of Omics Data in a Model Species Gene
expression data
Gene expression
data
Visually identified genes and metabolites to map onto functional
pathways
Visually identified genes and metabolites to map onto functional
pathways
iterate Visualize iterateVisualize
Inferred Protein-Protein
interactions
Inferred Protein-Protein
interactions
Visually-identified enriched pathways Visually-identified
enriched pathways
Visually-identifed, cell-based, network
regions of interest
Visually-identifed, cell-based, network
regions of interest
Interactive Visual &Statistical Analysis (e.g., ViVA, Co-expression analysis, PlantMetGenMap, Gene Mania)
Expression AnalysisExpression Analysis Metabolite Data
Metabolite Data
Identify sub-cellular locations of gene
products (e.g., Interactome)
Identify sub-cellular locations of gene
products (e.g., Interactome)
•Integrated gene expression and metabolomic data
•Interactive visual and statistical analysis
•Explicit support for iterative what-if analysis
/rg/ber
Testable Hypotheses
Testable Hypotheses
Other Data Sources to be Incorporated
1. Motifs from Regulatory Regions in Model Species 2. Cell-specific Expression 3. Pathways Wiki, place gene(s) of interest in established pathways. 4. Metabolites, incorporate information from Reactome5. Literature , PubMed Assistant???
Depiction Needed
Displays of inferred regulatory networks, as in Gene Mania.
Analysis of Gene Expression from A Partially Sequenced SpeciesExperimental
exposure of plants to stress
Experimental exposure of plants to
stress
Highly expressive
genes
Highly expressive
genes
Paint identified genes onto pathways (e.g.,
MapMan)
Paint identified genes onto pathways (e.g.,
MapMan)
Identification of homologs in reference
species (e.g. CoGe)
Identification of homologs in reference
species (e.g. CoGe)
Identification of candidate homologs that have been reported as co-expressed
(e.g., statistical correlation)
Identification of candidate homologs that have been reported as co-expressed
(e.g., statistical correlation)
Compare magnitude of activity across
reference pathways (e.g., PageMan,KEGG,
GO, MapMan)
Compare magnitude of activity across
reference pathways (e.g., PageMan,KEGG,
GO, MapMan)
Meta Annotator: Explore known features of these
genes (e.g. signaling pathways, eFP, literature)
Meta Annotator: Explore known features of these
genes (e.g. signaling pathways, eFP, literature)
Visualization of enriched
pathways
Visualization of enriched
pathways
Ecophysio-logical dataEcophysio-logical data
7 Formulate mechanistic
models
7 Formulate mechanistic
models
12
3
4
5
6
Co-expressed genes for
reference species
Co-expressed genes for
reference species
/rg/ber