Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison...
-
Upload
walter-long -
Category
Documents
-
view
216 -
download
0
Transcript of Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison...
-
A Grid based System for Microbial Genome Comparison and analysisAnil Wipat
University of Newcastle upon Tyne, UK
-
Motivation: Genome ComparisonThe past decade has seen the emergence of whole genome sequencingWhole genome sequences can reveal a great deal about the biology of an organismComparing genomes is one of the most effective ways to exploit genome sequence informationEstablishes the differences and similarities at the genetic level Aids biologists in understanding pathogenicity, evolution, ecology, metabolism, etc.
-
Microbial Genome comparison commonly applied at different levels:DNA(nucleotide sequence)(..atcggatcgtacgagcgatc..)DNA(nucleotide sequence)(..atcccatcgaacgagcgatc..)Proteins (amino acid sequenceMCSAKMQTR..) Nucleotide sequence Comparison(whole genome)Allagainst-all Amino acid sequence comparisons between proteinsProteins (amino acid sequenceMSAKMPTR..)
-
Motivation: Genome Comparison
The number of complete genome sequences is rapidly increasing as sequencing technology advancese.g. ~200 whole genomes have been sequencedSequence analysis and comparison is becoming more computationally intensiveLarge scale genome comparison is already beyond the capability of many laboratoriesHow are we going to handle all these genomes? New methods and technologies for genome comparison are required.
-
Microbase Project OverviewAims to create a scalable, Grid-enabled analytical system to support microbial genome comparison.Aims to support both the biological and bioinformatics community. Funded by BBSRC Bioinformatics and e-Science & DTIStarted April 2003.Collaboration with microbiologists and industrial partners Providing use cases.
-
Microbase: FunctionalityA system that utilises Grid resources to automatically perform genome comparisons at nucleotide and protein levels
An information repository that:maintains and exposes the results of these comparisons to users as a base level datasetprovides canned algorithms for analysis
A Grid-enabled high-performance environment to execute remote user-specified computations
Data integration with remote, Grid-enabled databases e.g. Genomic, Metabolic, Protein Interaction, Gene Expression databases, etc
-
MicrobaseLite: A PrototypeThe first prototype of the Microbase systemAutomatically performs all-against-all genome comparisons and exposes the resulting datasetsProvide services for biologists to browse and query genome sequences and comparison results Helps the specification of entire Microbase system and the derivation of use casesImplemented using a Component-based architecture with Web services interfacesAlso uses existing Grid technology myGrid Notification Service
-
MicrobaseLite: Datasets170 + microbial genomes includingBacteria, archaea, eukaryota Held in the GenomePool componentResults of all-against-all nucleotide sequence comparisonBlastn, MUMmerResults of all-against-all protein sequence comparisonBlastp, Ssearch, PromerHeld in the ComparisonPool component
Object-oriented data model of interspecies genome rearrangementsThe OGRE module component (current research)
-
MicrobaseLite: ArchitectureClient SideServer SideRequestBuilderObject-orientedDatabaseObjectModelBuilderDNAComparisonProteinComparisonComparisonDatabaseNotification ServiceExternal NotificationInternal NotificationBIOSQLGenomeLoaderWeb ServicesQueryMicrobial Genome PoolTask SchedulerPost-processingGenome Comparison PoolQuery & ExecutionOGRE ModuleClient ProxiesNotification ProxyWeb Services ProxyDataProcessingGraphicalViewerUser ToolsResponseReceiver
-
MicrobaseLite: Microbial Genome PoolProvide a Web / Grid service based information repository of microbial genomesmaintains a database of 170+ microbial genomesA web-service implementation of BioJava InterfacesUses the myGrid Notification Service to notify registered clients of new genomesAvailable for use now with a prototype APIClientsComparison PoolNotification ServiceExternal NotificationInternal NotificationBIOSQLGenomeLoaderWeb Service APIMicrobial Genome Pool
-
MicrobaseLite: Genome Comparison PoolRetrieves genomes from the Microbial Genome Pool automatically on NotificationExecutes a variety of genome comparison tools: Blast, MUMmer, Promer, MSPcrunchIncorporates a Task Scheduler for parallel processingUses N1 Grid Engine (batch system) to dispatch comparison tasks to run on Linux clusters Comparison outputs processed and stored into a relational database (mySQL).
-
Task Scheduler and scalability
Execution times of all-against-all comparisons with 10 microbial genomes (Blastp, Blastn, MSPcrunch, MUMmer and PROmer )
Number of Processors110203040Execution Time (minutes)978.02103.0357.6748.4837.33
Workstation
Data
Task Scheduler
Job Submission
BIOSQL
Microbial Genome Pool
Job Execution
Job State Checking
Input
Comparison Database
Genome Comparison Pool
N1 Grid Engine
Job Creation
Threshold Contral
Output
Pre-load
-
MicrobaseLite: User ToolsDemonstration graphical tools under developmentGenome Browser allows users to view genomes, the comparison results and the results of canned algorithmsDeployed at client-side operating via Web services
-
Vision for the full Microbase SystemContinue to explore scalability issues using MicrobaseLite as platform
Towards seamless scalabilityHarnessing of remote clusters on demand
A system for the submission and enactment of remotely conceived code or workflows for user defined comparative analysisInvestigating the integration of Taverna core to enact SCUFL workflows within Microbase
-
ConclusionsMicrobase aims to exploit Grid resources to provide a scalable system for Microbial genome comparison
MicrobaseLite produced as a prototype and demonstrator application for the biologist/bioinformatician
Work now underway on the full Microbase - a system to support remotely conceived computations
-
AcknowledgementsThe Microbase Team: Anil Wipat, Yudong Sun, Matthew Pocock, Keith Flanagan, Pete Lee, and Paul WatsonThe Microbase User Requirements/Use case contributorsmyGrid project (Particularly Southampton and EBI)The Industrial supporters: NonLinear Dynamics, NCIMB, Arrow Therapeutics, Angel Biotech, Complement Genomics, ACS Dobfar, AstraZenecaSee www.microbase.org.uk
-
Microbial Genome comparison commonly applied at two levels:DNA(nucleotide sequence)(..atcggatcgtacgagcgatc..)DNA(nucleotide sequence)(..atcccatcgaacgagcgatc..)Proteins (amino acid sequenceMCSAKMQTR..) Nucleotide sequence Comparison(whole genome)Allagainst-all Amino acid sequence comparisons between proteinsProteins (amino acid sequenceMSAKMPTR..)
-
OGRE: Object-oriented Genome REarrangements Model
A dataset that captures genomic rearrangements between microorganisms
Object-Oriented (OO) concepts and formalism are being used to classify the results of the nucleotide sequence comparison
An Ontology and OO-conceptual model is being developed to describe chromosomal rearrangements and to define objects that can represent them
Algorithms developed to recognise defined rearrangement features in nucleotide sequence comparison data
Objects made persistent in a OO database
-
MicrobaseLite: OGRE Module Performs object-oriented analysis and storage of genome rearrangementsAn OO dataset captures genomic rearrangements revealed through nucleotide sequence comparison Made persistent in an OO databaseProvides Web services interface for external users to query and analyse the OO datasetObject-orientedDatabaseObjectModelBuilderQuery & ExecutionOGRE ModuleComparison PoolWeb Services