Visualization and Analysis Workflow

10
Visualization and Analysis Workflow December 14, 2009 Draft /ber

description

Visualization and Analysis Workflow. December 14, 2009 Draft. /ber. The concept of a Workflow. Express the analysis of plant systems in terms of the data and operations on those data Multiple types of data (e.g., experimental, computed, archival) - PowerPoint PPT Presentation

Transcript of Visualization and Analysis Workflow

Page 1: Visualization and Analysis Workflow

Visualization and Analysis Workflow

December 14, 2009 Draft

/ber

Page 2: Visualization and Analysis Workflow

The concept of a Workflow

• Express the analysis of plant systems in terms of the data and operations on those data– Multiple types of data (e.g., experimental,

computed, archival) – Mutliple types of operations (e.g., analytical,

visualization, search)

• Treat the data and operations as components, which can be re-used, replaced, augmented, and extended.

Page 3: Visualization and Analysis Workflow

Workflow• A pathway of operations• Entities:

– Operation – Data

– Flow

• The flow through the operations is managed by the workflow software (e.g., VizTrails)

Page 4: Visualization and Analysis Workflow

Multi-layer workflows

List of genesList of genes

Co-expression analysis

Co-expression analysis

NetworkNetwork

Conceptual Level: High-level representation for casual users, with lots of defaults pre-selected

Professional Level: Visibility into underlying workflows, with freedom to select tools and parameters

List of genesList of genes NetworkNetwork

Analysis of omics data

Infrastructure Level: The explicit treatment of underlying data, databases, data integration, tools, operations, parameters, defaults, wrappers, provenance, interconnectivity, access, etc.

Statistical analysis tool

Statistical analysis tool

Interactive Visual

Analysis

Interactive Visual

Analysis

MetabolitesMetabolites

PathwaysPathways

Page 5: Visualization and Analysis Workflow

VizTrails- a candidate workflow architecture

• Visual programming interface for representing data and operations as workflows

• Loose coupling, using parameterizable Python wrappers• Extensible, flexible, re-usable components and workflows • Coupled with an attractive, flexible User Interface (to be

developed)

Conceptual workflow

Professional workflow Provenance

and metadata

Interactive visualizations

Page 6: Visualization and Analysis Workflow

Example Workflows from iPLANT team

• Goals: – Demonstrate the use of a workflow model for

representing the data and processes in plant genomic research exploration

– Provide a common structure for iPLANT use cases– Help define requirements for data integration– Motivate discussions about analysis that join multiple

types of data, allow users to interact dynamically, and provide interactive painting across visual representations (e.g., painting a metabolic pathway with gene expression magnitude)

Page 7: Visualization and Analysis Workflow

Workflow for Maize Gene Analysis

List of 20 homogolous maize gene IDs

List of 20 homogolous maize gene IDs

Find expression values for these genes (e.g, Next

Gen)

Find expression values for these genes (e.g, Next

Gen)

For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome

Browser)

For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome

Browser)

5 genes of interest 5 genes of interest

List of homologous Arabadopsis gene IDs

List of homologous Arabadopsis gene IDs

Modeling and Statistical Inference

Modeling and Statistical Inference

Literature searchLiterature search

Homolog Finder (e.g, CoGE)

Homolog Finder (e.g, CoGE)

Candidate maize gene

Candidate maize gene

Co-Expression Analysis (e.g., ATTED2)

Co-Expression Analysis (e.g., ATTED2)

Expression Network of 10 Arabidopsis Genes Expression Network of 10 Arabidopsis Genes

Homolog Finder (e.g, CoGE)

Homolog Finder (e.g, CoGE)

Expression data for 20 maize genes

Expression data for 20 maize genes

/tb/ber

Examine clusters that can handle maize data

(e.g., eNorthern, MapMan)

Examine clusters that can handle maize data

(e.g., eNorthern, MapMan)

note: very limited data for maize so may need to go to rice

iterate

Page 8: Visualization and Analysis Workflow

Workflow for Analysis of Omics Data in a Model Species Gene

expression data

Gene expression

data

Visually identified genes and metabolites to map onto functional

pathways

Visually identified genes and metabolites to map onto functional

pathways

iterate Visualize iterateVisualize

Inferred Protein-Protein

interactions

Inferred Protein-Protein

interactions

Visually-identified enriched pathways Visually-identified

enriched pathways

Visually-identifed, cell-based, network

regions of interest

Visually-identifed, cell-based, network

regions of interest

Interactive Visual &Statistical Analysis (e.g., ViVA, Co-expression analysis, PlantMetGenMap, Gene Mania)

Expression AnalysisExpression Analysis Metabolite Data

Metabolite Data

Identify sub-cellular locations of gene

products (e.g., Interactome)

Identify sub-cellular locations of gene

products (e.g., Interactome)

•Integrated gene expression and metabolomic data

•Interactive visual and statistical analysis

•Explicit support for iterative what-if analysis

/rg/ber

Testable Hypotheses

Testable Hypotheses

Page 9: Visualization and Analysis Workflow

Other Data Sources to be Incorporated

1. Motifs from Regulatory Regions in Model Species 2. Cell-specific Expression 3. Pathways Wiki, place gene(s) of interest in established pathways. 4. Metabolites, incorporate information from Reactome5. Literature , PubMed Assistant???

Depiction Needed

Displays of inferred regulatory networks, as in Gene Mania.

Page 10: Visualization and Analysis Workflow

Analysis of Gene Expression from A Partially Sequenced SpeciesExperimental

exposure of plants to stress

Experimental exposure of plants to

stress

Highly expressive

genes

Highly expressive

genes

Paint identified genes onto pathways (e.g.,

MapMan)

Paint identified genes onto pathways (e.g.,

MapMan)

Identification of homologs in reference

species (e.g. CoGe)

Identification of homologs in reference

species (e.g. CoGe)

Identification of candidate homologs that have been reported as co-expressed

(e.g., statistical correlation)

Identification of candidate homologs that have been reported as co-expressed

(e.g., statistical correlation)

Compare magnitude of activity across

reference pathways (e.g., PageMan,KEGG,

GO, MapMan)

Compare magnitude of activity across

reference pathways (e.g., PageMan,KEGG,

GO, MapMan)

Meta Annotator: Explore known features of these

genes (e.g. signaling pathways, eFP, literature)

Meta Annotator: Explore known features of these

genes (e.g. signaling pathways, eFP, literature)

Visualization of enriched

pathways

Visualization of enriched

pathways

Ecophysio-logical dataEcophysio-logical data

7 Formulate mechanistic

models

7 Formulate mechanistic

models

12

3

4

5

6

Co-expressed genes for

reference species

Co-expressed genes for

reference species

/rg/ber