Dr. Michael Schroeder Department of Computing City University, London, UK [email protected] msch...
-
Upload
sheena-armstrong -
Category
Documents
-
view
220 -
download
0
Transcript of Dr. Michael Schroeder Department of Computing City University, London, UK [email protected] msch...
Dr. Michael Schroeder Department of ComputingCity University, London, [email protected]://www.soi.city.ac.uk/~msch
Visiting ScientistMedical Research CouncilCambridge, UK
BioGrid
Drowning in information...
• Biology has changed dramatically from an information-light to an information-intensive area
• Much publicised Human Genome Project is only tip of the iceberg
• >500 tools online
• >8000 new abstracts per month
LLNEYLEEVE EYEEDE
Heureka!
?????????????
BioGrid
• Provide access to multiple, heterogeneous and geographically distributed information sources.
• perform active searches for relevant information in non-local domain (includes retrieving, analysing, manipulating, and integrating information)
BioGrid ObjectivesObjectives:Information and knowledge grid allowing knowledge discovery and access to multiple types of structured and unstructured data, including gene expression and protein interaction data
Business objectives: • Grid for next generation classification research infrastructure for large proteomics and genomics databases; •Efficient transactional enterprise collaboration; •Faster time to market biotech innovation
ExampleA scientist is interested in a gene,e.g. NOX4– Search PubMed for articles
• Too many hits• Gene also known under different name
– Analyse gene expression data• Which genes behave similar to NOX4• Function of NOX4?
– Analyse protein interactions• Which interactions and processes does
expression of NOX4 trigger?
Challenges
• Semantic Complexity– Computer does not “understand” data– DBs and systems cannot inter-operate
• Computational complexity – generating protein interaction map takes ca. 7
days– analysing large sets of gene expression data can
take up to an hour– analysis of large text bodies complex
BioGrid Vision
BioGrid
Interactiondata
Metabolic
pathway data
Expressiondata
Sequences
Character-isation
of target
sequence
Scientific literature
Approach• Semantic Web
– global and local ontologies to capture meta-data and facilitate semantic inter-operability
• Grid technology– transparent access to distributed resources
• Agent technology– personal information agent collecting and presenting
relevant information on behalf of its user
BioGridClient
BioGridClient
BioGrid
Client
BioGrid
Server
LiteratureClassification Server
The Grid
Space
Explorer
PSIMAP
Classification server
• Finding and processing relevant scientific literature
BioGrid
Interactiondata
Metabolic pathway data
Expressiondata
Sequences
Character-
isation of
target sequenc
eScientific literature
Results of PubMed• Lorenz P, Transcriptional repression mediated by
the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44.
• Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31....
Author
Title
YearJournal
However, to a machine things look different!
Results of PubMed
....
Solution: tag data (XML)
Results of PubMed• <author> </
author><title>
. </title>
<journal> </journal><year><year>• <author> </
author><title>
. </title>
<journal> </journal><year><year>
• ...
However, to a machine things look different!
Results of PubMed
• ...
Solution: use ontologies(Semantic Web)
Semantic Web
• DAML+OIL is XML-based language to specify ontologies
• Annotations of data refer to global ontology (where appropriate), hence joint understanding of data possible
• Ongoing efforts in bioinformatics: e.g. gene ontology
Classification Server
Scientific objectives:•Effective concept recognition•Pattern matching•Intelligent data sourcing agents and tagging technology •Automated categorisation in a biotechnology-domain •Metadata hierarchy •Functional interoperability methodology design•Domain knowledge mapping,•Implementing a logical domain ontology •Integration of agent & classification logic & visualisation technology.
Space Explorer
• … is a general purpose visualisation tool facilitating interactive exploration of large data sets
• … deals with multi-variate and proximity data • … provides
• principal component analysis• multi-dimensional scaling (principal co-ordinate analysis, spring
embedding)• clustering
• … provides• dendrograms• 2D and 3D (using VRML) scatter plots• graphs and colour maps
BioGrid
Interactiondata
Metabolic pathway data
Expressiondata
Sequences
Character-
isation of
target sequenc
eScientific literature
Example: gene expression data
Example: Protein topology
Protein Interaction: PSIMAP
BioGrid
Interactiondata
Metabolic pathway data
Expressiondata
Sequences
Character-
isation of
target sequenc
eScientific literature
• Based on 3D structure, PSIMAP determines interactions of proteins
• Structure of map of great importance for understanding of biological processes
• Generation and analysis of the map are computationally expensive
PartnersNo.
Organisation(abbreviation)
Country
RTD role in the project
1University of Groningen (RUG)
NLUser, Bioinformatics on drug discovery
2ZooRobotics (ZRO)
NLCo-ordinator, Supplier of GRID Classification Server, Exploitation Mng.
3City University London (CIT)
UKSupplier of intelligent agents and Space Explorer
4University of Cyprus (UCY)
ELSupplier of GRID knowledge engineering
5Medical Research Centre (MRC)
UKSupplier of PSIMAP, User, bio informatics on Food and Nutrition
WP3:Classification logic integration
WP1:Source domain analysis (data, standards, protocols)
WP2:Hierarchy creation, Metadata model development
WP4:Implementation agent technology
WP7:Dissemination & Exploitation
WP5:Implementation Visualisation technology
WP0:Management
Integration Analysis Prototype Development
Main deliverable:1st prototype
Main deliverable:2nd prototype
Measurement andEvaluation
WP6:Measurement and evaluation of results
Pert diagram
Work packages
Workpackage title
WP0 Management
WP1 Source domain analysis
WP2 Hierarchy creation, Metadata model development
WP3 Classification logic integration
WP4 Agent implementation
WP5 Visualisation implementation
WP6 Measurement and evaluation
WP7 Dissemination and exploitation
Expression Space:Space Explorer
Pathway Space:
BioGrid
Interaction Space:PSIMAP
Literature Space:Classification Server
BioGrid Mission: Distributed computational biology platform for fast pharmaceutical research