BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic...

66
BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign www.beespace.uiuc.edu Fifth Annual Project Workshop IGB, Urbana IL May 22, 2009

Transcript of BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic...

Page 1: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

BeeSpace Informatics:Interactive System for

Functional Analysis

BeeSpace Informatics:Interactive System for

Functional Analysis

Bruce SchatzInstitute for Genomic Biology

University of Illinois at Urbana-Champaignwww.beespace.uiuc.edu

Fifth Annual Project WorkshopIGB, Urbana IL May 22, 2009

Page 2: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Concept Navigation in BeeSpaceConcept Navigation in BeeSpace

NeuroscienceLiterature

MolecularBiology

Literature

BeeLiterature

Flybase,WormBase

BeeGenome

Brain RegionLocalization

Brain GeneExpression

Profiles

BehavioralBiologist

MolecularBiologist

Neuro-scientist

Page 3: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Informatics: From Bases to SpacesInformatics: From Bases to Spaces

data Bases support genome datae.g. FlyBase has sequences and mapsGenes annotated by GeneOntology and

linked to biological literature

information Spaces support biological literaturee.g. BeeSpace uses automatically generated conceptual relationships to navigate functions

Page 4: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

System ArchitectureSystem Architecture

Page 5: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

System VersionsSystem Versions V1 Filter Concept Graph

Search, Expand, Merge, Switch, Visualize V2 Cluster Conceptual Groupings

Small Worlds (Natural), Language Model (Steerable), Concepts/Documents

V3 Summarize Gene Descriptions Gene Extraction, Sentence Classification

V4 Analyze Functional Concepts Concept Identification, Category Grouping

V5 Answer Entity Relationships Entities, Relations, Templates

Page 6: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Informatics Researchers (Faculty)Informatics Researchers (Faculty) Investigators: Bruce Schatz, systems (Medical Information Science) ChengXiang Zhai, algorithms (Computer Science) Collaborators (students): Saurabh Sinha, Computer Science Jiawei Han, Computer Science Sheng Zhong, Bioengineering Nathan Price, Chemical & Biomolecular Engineering Collaborators (advices): John MacMullen, Library & Information Science Dan Roth, Computer Science Roxana Girju, Linguistics Karrie Karahalios, Computer Science

Page 7: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Informatics Researchers (Staff)Informatics Researchers (Staff)

V1-V3 Todd Littell, research programmer Jim Buell, research coordinator Nyla Ismail, biology postdoc Moushumi Sen Sarma, biology postdoc

V4-V5 David Arcoleo, research programmer Barry Sanders, research programmer Moushumi Sen Sarma, biology postdoc Radhika Khetani, biology postdoc

Page 8: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Informatics Researchers (Students)Informatics Researchers (Students)

V1 Filter (parse)Jing Jiang, Azadeh Shakery, Yuanhua LvV2 Cluster (group)Brant Chee, Qiaozhu Mei, Peixiang Zhao V3 Summarize (classify)Xu Ling, Jing Jiang, Qiaozhu Mei, Xin HeV4 Analyze (annotate)Xin He, Brant Chee, Moushumi Sarma, Xu LingV5 Answer (extract)Xu Ling, Xin He, Yanen Li, Yue Lu

Page 9: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Analysis Environment: FeaturesAnalysis Environment: Features

SPACE is a Paradigm not a Metaphor!

Point of View for YOUR Problem

Externally:-Dynamically describe custom Region of Space-Merge Regions to form Hypothesis Space-Differentially express genes against Space

Page 10: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Analysis Environment: SystemAnalysis Environment: System

Concepts and Genes are Universal Entities!

Uniformly Represented Uniformly Manipulated

Internally:-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Genes from Documents into Databases

Page 11: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Automatic Categorization v2Automatic Categorization v2 Sorting of Spaces based on Metadata Sorting of Spaces based on Ontology

MeSH for Medline Abstracts Gene Ontology computed for documents

Sorting of Spaces based on Clustering Natural Maps from Small Worlds Steerable Maps from Language Models

Semantic Indexing of Dynamic SpacesFast System enables Interactive Sorting!

Page 12: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Small World GraphSmall World Graph

Page 13: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Semantics Deeper and FasterSemantics Deeper and Faster Semantic Indexing across all of Medline

Previous Attempts used Word Co-Occurrence Now Phrase Parser works general-purpose Now Mutual Information full differential

Parallel Optimization of MI Graph Real-time Computation Shared Memory Cluster Interactive on our 16PC 256GB RAM workerbee Dynamic Spaces then Dynamic Semantic Indexing

Interactive Clustering Natural Map Heuristic Approximation Small Worlds Graphs

Page 14: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Dynamic ClusteringDynamic Clustering

Page 15: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Automatic Curation v3Automatic Curation v3 Automatic Summarization of Genes

Retrieve relevant sentences about gene Classify sentences into important aspects

protein domain, homolog/ortholog expression pattern, phenotype function regulatory element, genetic interaction

Generalizing to Biology Entities Genes, anatomical, behavior, chemical Question answering from biology factoids

Computed Curation from Literature

Page 16: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Gene Summary (FlyBase) Gene Summary (FlyBase)

GP

EL

SI

GI

MP

WFPI

Page 17: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Gene Summary (BeeSpace)Gene Summary (BeeSpace)

Structured summary consists of relevant sentences covering 6 aspects of a gene Gene Products (GP) Expression Location (EL) Sequence Information (SI) Wild-type Function & Phenotypic

Information (WFPI) Mutant Phenotype (MP) Genetical Interaction (GI)

Page 18: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Drosophila gene Abelson (Abl) tyrosine kinaseDrosophila gene Abelson (Abl) tyrosine kinase

Page 19: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Tribolium gene ScrTribolium gene Scr

Page 20: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Gene Summarizer New AspectsGene Summarizer New Aspects

New categories (proposed by FlyBase curators) GP + SI => PS (protein domain or structure) SI => HO (homologs or orthologs) EL => EP (spatial/temporal expression patterns) SI => RE (regulatory element information) WFPI + MP => PF (wild-type or mutant phenotype

and function) GI => IT (genetic or physical interaction) New (beyond FlyBase) => PG (population genetics)

Utilize cross-domain information for improving the GS on other organisms.

Page 21: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 22: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

BeeSpace System v3BeeSpace System v3

SPACES and REGIONS Dynamic and Relative

Space is collection of documentsRegion is collection of terms

Extract creates new Region from old Space Map creates new Space from old Region New from Old Spaces and Regions via merges Summarize classifies Gene within Space Annotate finds differential functional expression

Page 23: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

BeeSpace Semantic OperationsBeeSpace Semantic Operations

Merge (S1,S2) into S3

Summarize (S) into Gene classify

Page 24: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 25: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 26: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 27: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 28: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 29: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 30: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 31: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 32: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 33: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 34: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 35: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 36: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 37: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 38: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 39: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 40: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

New Interface v4New Interface v4 Single Window, Multiple Panes

Space Panel, Service Tabs

SPACES custom, system

FILTER searching, sorting CLUSTER map natural and steerable SUMMARIZE categorize using space ANALYZE annotate using space

Page 41: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 42: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 43: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 44: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 45: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 46: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 47: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 48: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Functional Analysis v4Functional Analysis v4The software system goes beyond a searchable database,

using statistical literature analyses to discover functional relationships between genes and behavior.

This research will enable all scientists who study bee genes to live on the frontier of integrative biology, where biotechnology enables routine expression analysis and bioinformatics enables functional analysis unconstrained by pre-existing categories.

Genelist Analyzer v4-Differential Expression of Gene Names against Space-Background is custom made Literature Space-Produces Concept List from Gene List-Analyze using Concept Navigation and Gene Summarization

Page 49: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 50: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 51: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 52: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 53: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 54: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 55: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 56: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 57: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 58: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 59: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Question Answering v5Question Answering v5

Entities and RelationsQuestion Answering templates

Entity Gene, Anatomical Behavior, Chemical

Relation Regulation (Gene-Gene) Expression (Gene-Anatomy) Function (Gene-Behavior) Biological Process Function (Gene-Chemical) Molecular Function

Page 60: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 61: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 62: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 63: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 64: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 65: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 66: BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Towards the InterspaceTowards the Interspace

The Analysis Environment technology is GENERAL!

BirdSpace? BeeSpace?PigSpace? CowSpace?

ArthropodSpace? AnimalSpace?

BioSpace? MedSpace?