The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble

download The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble

If you can't read please download the document

description

EPSRC funded UK e-Science Program Pilot Project Thanks to the other members of the Taverna project,

Transcript of The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble

The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble Roadmap The problem my Grid Semantic Service / Workflow Discovery Provenance and metadata modelling Semantic Web is Semantic Glue EPSRC funded UK e-Science Program Pilot Project Thanks to the other members of the Taverna project, 1.Identify new, overlapping sequence of interest 2.Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa Middleware for Life Science solutions Interoperation of services and data sources Repeat Reuse and Share Provenance Manage results My tools, my resources acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg Middleware for Life Science acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg Taverna Workflow Workbench OGSA-Distributed Query Processing Results management LSID mIR e-Science coordination e-Science mediator e-Science process patterns e-Science events Notification service Architectural framework my Grid information model Metadata & provenance management using semantics KAVE Legacy integration Publication and Discovery using semantics Feta Pedro Ontology Portal & Application tools How to select among services? Mostly inputs & outputs are string Domain specific descriptions of capabilities Selection is part of workflow assembly by bioinformaticians Selection of alternates for failure also generally user defined, and usually replicas, but need not be. First, find your service Which means describe your service Publish and find services (and workflows) with description using an ontology Define domain types for objects passed around workflow Define a set of dimensions with which service capabilities GRIMOIRES / WebDAV directory Tied to BioMOBY Central Semantic discovery Publish and find services (and workflows) with description using an ontology (in OWL/RDF) Define domain types for objects passed around and a set of dimensions with which service capabilities can be defined using processor abstraction Bootstrapping descriptions Mining and maintaining descriptions The Expert Annotator GRIMOIRE / WebDAV directory Tie into BioMOBY centrala-beta/mygrid/descriptions/http://phoebus.cs.man.ac.uk:8100/fet a-beta/mygrid/descriptions/ Phillip Lord, Pinar Alper, Chris Wroe, and Carole Goble Feta: A light-weight architecture for user oriented semantic service discovery in Proc of 2 nd European Semantic Web Conference, Crete, June 2005 OWL-S OWL-WS WSMOWSDL-S Web Interface Processor API Processor API Generic Schema for Service (part of Information model) Specific Application Ontology e.g. caCORE Semantic Web Services Layered model Wroe C, Goble CA, Greenwood M, Lord P, Miles S, Papay J, Payne T, Moreau L Automating Experiments Using Semantic Data on a Bioinformatics Grid in IEEE Intelligent Systems Jan/Feb 2004 We dont describe WSDL, we describe operations and processors We are classifying for people not machines, so dont be too clever! Operation name, description task method resource application Service name description author organisation Parameter name, description semantic type format transport type collection type collection format WSDL based Web service WSDL based operation Soaplab servicebioMoby serviceworkflow hasInput hasOutput Local Java code subclass Semantic Web Services Semantic Descriptions for Discovery Automated Discovery services or workflows Knowledge assisted brokering & match making Guided instantiation and substitution Composition Automated Composition Self organising SOA Guided workflow assembly Composition (workflow) verification and validation Semantics-enabled Problem Solving Task configuration Workflow construction Workflow Advisor Semantic service discovery EDSO task ontology Observations Technical and Abstraction mismatches Man vs Machine. Manual vs Automation. Service vs Domain Semantics. Basic errors in modelling. Web services in the wild suck. Not everything is a Web Service. Legacy Services, middleware, content and practice. Practicality mismatches Automated or assisted discovery desirable, likely, popular Automated composition undesirable, unlikely, unpopular Capturing and Curating Content Annotation is hard. Building the Ontology is hard. QA is hard. Keeping the annotation up to date is hard. The Expert annotator; Altruism for Reuse. Quality Control; Hendlers Principle A little semantics goes a long way! Too complicated to use. Tools!! Sharing takes effort. Unanticipated reuse by people you dont know in automated workflows. The metadata needed pays off but its challenging and costly to obtain.. Automated, service providers, network effects Quality control. Misuse. Inappropriate use. Competitive advantage, Intellectual property. Workflow design - local or licensed services The devil is in the detail Experiment provenance Simple workflow Descriptions in biological language Workflows for automagical execution implicit iteration, generous typing Debugging and rerunning provenance logs Simple classifications of services Expressive ontologies to match up services automatically Descriptions for automatic service execution and fault management Courtesy Jim Myers, NCSA e-Scientific method in vivo in vitro in silico Discovery Electronic Notebook Scientific Provenance Engineering Provenance Authorization Project Organization Logging Curation Scientific Content Tavena workflow workbench in my Grid Provenance in myGrid The process The data derivation path The ownership The evidence of knowledge a1 E1:S1 X1 E1:S2 Y1Z1 Manchester university how the Y1 was produced using a1 Provenance graph representation Identity for the node: URI Universal Resource Identifier An extension of URL An RDF (Resource Description Framework) graph: derivedFrom inputOf Ontologies Telling what they are isA hasFeature Each URI is associated with: A set of provenance statements A RDF provenance graph urn:data:f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data12 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2. urn:hit1 urn:hit50.. [instanceOf] [similar_sequence_to] Data generated by services/workflows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8. urn:hit5 urn:hit10.. [contains] [instanceOf] urn:BlastNInvocation3 urn:invocation5 urn:data:f1 [output] New sequence Missed sequence [hasName] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFrom] Resource Description Framework Provenance Flexible and extensible schema Data fusion and aggregation across provenance metadata Reasoning and querying over descriptions Transparent description myGrid Provenance example Annotate Anything People, meetings, discussions, conference talks Scientific publications, recommendations, quality comments Events, notifications, logs Services and resources Schemas and catalogue entries Models, codes, builds, workflows, Data files and data streams Sensors and sensor data DFDL, JSDL, SAML, WSDL, WSRF, DL*, ML* as RDF? If you are using a controlled vocabulary, then lets use a standard controlled vocabulary language. Seamark Demonstration: Identification of new drug candidates for BRKCB-1 Courtesy Joanne Luciano Observations Flexible metadata description for data Multi tiered model for different perspectives Machine vs Person; The ontologies for people discovery are not good enough for knowledge aggregation Make the semantics invisible Provenance aggregation Identity crisis Exposing knowledge means knowledge exposure. Reluctance to give up knowledge assets. Vulnerability. Knowledge is power. Incentive models. IPR. Privacy. Capturing the Semantic Content explicitly. Acquiring ontology annotations; Hard to describe policies. Vagueness and trivia. Trying to capture people-focused provenance. Hendler principle A little semantics goes a long way. Data mining Knowledge Discovery Smart search Social networking Smart portals Agents Information Integration and aggregation Use of Semantic Web Technologies A Semantic Web of Life Science