myEquivalents, aka a new cross-reference service

23
1 myEquivalents myEquivalents aka, Cross Reference Service aka, Cross Reference Service Marco Brandizi, EBI, 14 Feb 2013 Marco Brandizi, EBI, 14 Feb 2013 (Image Source: http://stackoverflow.com/questions/13340232/pythagoras-tree-to-windy-tree) (Image Source: http://stackoverflow.com/questions/13340232/pythagoras-tree-to-windy-tree)

Transcript of myEquivalents, aka a new cross-reference service

Page 1: myEquivalents, aka a new cross-reference service

1

myEquivalentsmyEquivalentsaka, Cross Reference Serviceaka, Cross Reference Service

Marco Brandizi, EBI, 14 Feb 2013Marco Brandizi, EBI, 14 Feb 2013(Image Source: http://stackoverflow.com/questions/13340232/pythagoras-tree-to-windy-tree)(Image Source: http://stackoverflow.com/questions/13340232/pythagoras-tree-to-windy-tree)

Page 2: myEquivalents, aka a new cross-reference service

2It's not rocket science...

(Image Source: http://www.chinapage.com/space/moon/orbiter.html)(Image Source: http://www.chinapage.com/space/moon/orbiter.html)

Page 3: myEquivalents, aka a new cross-reference service

3… yet...

(Image Source: (Image Source: oh com'n! oh com'n! 2001, A Space Odissey) 2001, A Space Odissey)

Page 4: myEquivalents, aka a new cross-reference service

4Rationale

References: AE ENA

References: BioSD ENA

References: BioSD AE

Page 5: myEquivalents, aka a new cross-reference service

5Rationale

References: AE ENA

References: BioSD AE

References: BioSD ENA

Page 6: myEquivalents, aka a new cross-reference service

6So, it's about equivalence relations

Page 7: myEquivalents, aka a new cross-reference service

7Why a Centralised ServiceBioSD

SamplesSAMEA597705

BioSDSamples

SAMEA597705

AEExperimentsE-AFMX-11

http://www.ebi.ac.uk/arrayexpress/experiments/E-AFMX-11

AEExperimentsE-AFMX-11

http://www.ebi.ac.uk/arrayexpress/experiments/E-AFMX-11

AEData

E-AFMX-11http://www.ebi.ac.uk/arrayexpress/files/E-AFMX-11

AEData

E-AFMX-11http://www.ebi.ac.uk/arrayexpress/files/E-AFMX-11ENA

SequencesSRR034107

ENASequences

SRR034107 Bundle 1Bundle 1Bundle 1Bundle 1

http://dbpedia.org/resource/Barak_h_obamahttp://dbpedia.org/resource/Barak_h_obama

http://en.wikipedia.org/wiki/Barack_Obamahttp://en.wikipedia.org/wiki/Barack_Obama

http://www.freebase.com/view/en/barack_obamahttp://www.freebase.com/view/en/barack_obama

Bundle 2Bundle 2Bundle 2Bundle 2

Managing equivalenceManaging equivalenceclasses is more compactclasses is more compact

and more efficientand more efficient

Managing equivalenceManaging equivalenceclasses is more compactclasses is more compact

and more efficientand more efficient

Page 8: myEquivalents, aka a new cross-reference service

8

Why a Centralised Service

Simplifies managementURI auto-creationLinks updated independently on their consumers and once only

Avoids redundancyimplicit symmetry and transitivity in the bundlessingle-point storage and rendering vs one per repository

More efficientA specialised service for this is potentially faster, e.g. sameas.org

More features can be added to the basic serviceMultiple access formats and paradigms (e.g., XML, RDF, SPARQL)MIRIAM integration

Simplifies managementURI auto-creationLinks updated independently on their consumers and once only

Avoids redundancyimplicit symmetry and transitivity in the bundlessingle-point storage and rendering vs one per repository

More efficientA specialised service for this is potentially faster, e.g. sameas.org

More features can be added to the basic serviceMultiple access formats and paradigms (e.g., XML, RDF, SPARQL)MIRIAM integration

Page 9: myEquivalents, aka a new cross-reference service

9The Model

BioSD/SamplesSAMEA597705BioSD/SamplesSAMEA597705

AE/ExperimentsE-AFMX-11

AE/ExperimentsE-AFMX-11

AE/DataE-AFMX-11

AE/DataE-AFMX-11

ENA/SequencesSRR034107

ENA/SequencesSRR034107

ServiceAccessionService

AccessionEntityEntity

Entity MappingEntity Mapping

BioSDBioSD ENAENA AEAE

Service collectionsame accessions,implicit mapping

Service collectionsame accessions,implicit mapping

Bundle(i.e., partition class)

Bundle(i.e., partition class)

provides service

providesservice

providesservice

RepositoriesRepositories

Service Properties:Title, DescriptionURI Pattern

Repository Properties:Title, DescriptionURLManaging OrganizationLogo URL

Page 10: myEquivalents, aka a new cross-reference service

10API Examples (Java, Mapping)public interface EntityMappingManager { public void storeMappings ( String ... entityIds ); public void storeMappingBundle ( String ... entityIds ); public int deleteMappings ( String ... entityIds ); public int deleteEntities ( String ... entityIds ); public EntityMappingSearchResult getMappings ( Boolean wantRawResult, String ... entityIds ); public EntityMappingSearchResult getMappingsForTarget ( Boolean wantRawResult, String targetServiceName, String entityId ); public String getMappingsAs ( String outputFormat, Boolean wantRawResult, String ... entityIds ); public String getMappingsForTargetAs ( String outputFormat, Boolean wantRawResult, String targetServiceName, String entityId ); public void close ();}

public interface EntityMappingManager { public void storeMappings ( String ... entityIds ); public void storeMappingBundle ( String ... entityIds ); public int deleteMappings ( String ... entityIds ); public int deleteEntities ( String ... entityIds ); public EntityMappingSearchResult getMappings ( Boolean wantRawResult, String ... entityIds ); public EntityMappingSearchResult getMappingsForTarget ( Boolean wantRawResult, String targetServiceName, String entityId ); public String getMappingsAs ( String outputFormat, Boolean wantRawResult, String ... entityIds ); public String getMappingsForTargetAs ( String outputFormat, Boolean wantRawResult, String targetServiceName, String entityId ); public void close ();}

Multiple access meansProgrammatic APILine CommandsREST Web Service

Multiple data exchange formatsJava and Java REST (Jersey used, client available)XML (The same that comes from REST, mapped via JAXB)JSON (future, maybe)RDF (future, more later)

Queries via service+accession or URI (in future)

Multiple access meansProgrammatic APILine CommandsREST Web Service

Multiple data exchange formatsJava and Java REST (Jersey used, client available)XML (The same that comes from REST, mapped via JAXB)JSON (future, maybe)RDF (future, more later)

Queries via service+accession or URI (in future)

Page 11: myEquivalents, aka a new cross-reference service

11API Examples (Java)

Page 12: myEquivalents, aka a new cross-reference service

12API Examples (Web Service)

Page 13: myEquivalents, aka a new cross-reference service

13Component-based Architecture

Components and their topology configured/instantiated via SpringEasy to build features like:

CachingLoggingLayered computations (e.g., add services in the same collection)Integration of 3-rd party systems (e.g., MIRIAM, more later)

Components and their topology configured/instantiated via SpringEasy to build features like:

CachingLoggingLayered computations (e.g., add services in the same collection)Integration of 3-rd party systems (e.g., MIRIAM, more later)

Page 14: myEquivalents, aka a new cross-reference service

14

Related Work

myEquivalents inspired to thisDoes pretty much what we doWith a very similar internal modelBut for URIs onlyCode not availableOnly available as SAAS, no binary to deploy

myEquivalents inspired to thisDoes pretty much what we doWith a very similar internal modelBut for URIs onlyCode not availableOnly available as SAAS, no binary to deploy

Page 15: myEquivalents, aka a new cross-reference service

15

Related Work

Pair model for URIs is a standardEquivalence-based model missingDual identification mechanism missing

Pair model for URIs is a standardEquivalence-based model missingDual identification mechanism missing

Page 16: myEquivalents, aka a new cross-reference service

16

Future: RDF, SPARQL, Semantic WebDereferenceable URIs, with RDF output

Keeping support to the accession-based model tooSPARQL, with support for both:

?b a mye:Bundle; mye:has-entity ?e1, ?e2, e3 (equivalence class model).?entity1 owl:sameAs ?entity2 (mapping pair model)and for entity containers:

_:e1 mye:provided-by [ _:s1 a mye:Service dc:title 'BioSD' ]adding reasoning over service types could come easilye.g. sample-service is-a biomaterial-service

To be implemented with direct translation from Java objects to SPARQL (not just export), e.g., using ARQ in Jena

Support for inference directly in the object modelfaster than a generic reasoner

Support for SPARQL/UPDATE?Would allow for using an endpoint straight as back-end

Support to keyword-based search, as in sameas.orgRequires the addition of attributes (eg, title, description), nothing available at the

Dereferenceable URIs, with RDF outputKeeping support to the accession-based model too

SPARQL, with support for both:?b a mye:Bundle; mye:has-entity ?e1, ?e2, e3 (equivalence class model).?entity1 owl:sameAs ?entity2 (mapping pair model)and for entity containers:

_:e1 mye:provided-by [ _:s1 a mye:Service dc:title 'BioSD' ]adding reasoning over service types could come easilye.g. sample-service is-a biomaterial-service

To be implemented with direct translation from Java objects to SPARQL (not just export), e.g., using ARQ in Jena

Support for inference directly in the object modelfaster than a generic reasoner

Support for SPARQL/UPDATE?Would allow for using an endpoint straight as back-end

Support to keyword-based search, as in sameas.orgRequires the addition of attributes (eg, title, description), nothing available at the

Page 17: myEquivalents, aka a new cross-reference service

17

Related Work

It is to manage entities that share accessionse.g., PubMed and CiteXplore

So, not enough for usBut would be great to integrate!

It is to manage entities that share accessionse.g., PubMed and CiteXplore

So, not enough for usBut would be great to integrate!

Page 18: myEquivalents, aka a new cross-reference service

18Future: MIRAM and identifiers.org support

Services &Entities

Services &Entities

Service CollectionService Collection

Page 19: myEquivalents, aka a new cross-reference service

19Future: MIRAM and identifiers.org supportService CollectionService Collection

ServicesServices

EntityEntity

Page 20: myEquivalents, aka a new cross-reference service

20Combining MIRAM and myEquivalents

Uniprot P62158Uniprot P62158

MIR:001000234599080

MIR:001000234599080

http://www.ebi.ac.uk/citexplore/citationDetails.do?

dataSource=MED&externalId=4599080

http://www.ebi.ac.uk/citexplore/citationDetails.do?

dataSource=MED&externalId=4599080

HubMed4599080HubMed4599080

http://www.ncbi.nlm.nih.gov/protein/P62158http://www.ncbi.nlm.nih.gov/protein/P62158

Mappings Stored inmyEquivalents

Mappings Stored inmyEquivalents

Computed byMIRIAM

Computed byMIRIAM

Computed byMIRIAM

Computed byMIRIAM

Resources importedfrom MIRIAM

Resources importedfrom MIRIAM

Page 21: myEquivalents, aka a new cross-reference service

21

Issues: Access Control (on-going)We assume:

updates are managed by just a few people, within the same organisation and collaborating teammost of data is publicly readable

except private entities (maybe)Implies a very simple model, users can have the roles of

reader, can only read public stuffthe only thing got by anonymous (i.e., un-authenticated user)

editor, can change all (mappings, service descriptions etc)admin, can administrate users and permissionsThough simple, it's a good base for managing provenance too

Authentication detailsall requests contains user + hash(password) travel via SSL/HTTPS and via POSTmakes it unnecessary to have complex mechanisms based on shared secret (eg, OAuth)

We assume:updates are managed by just a few people, within the same organisation and collaborating teammost of data is publicly readable

except private entities (maybe)Implies a very simple model, users can have the roles of

reader, can only read public stuffthe only thing got by anonymous (i.e., un-authenticated user)

editor, can change all (mappings, service descriptions etc)admin, can administrate users and permissionsThough simple, it's a good base for managing provenance too

Authentication detailsall requests contains user + hash(password) travel via SSL/HTTPS and via POSTmakes it unnecessary to have complex mechanisms based on shared secret (eg, OAuth)

Page 22: myEquivalents, aka a new cross-reference service

22

Issues: Versioning (future?)That's been ignored so far

cause we're assuming one version ↔ one accession ↔ one URIand leaving versioning fun to the repositories

Must be addressed later

Possible scenario: Entities are identified by means of service + acc + versionNew version relations are added (has-version, is-prior-version, has-next-version)It is still one URI ↔ one entity at the level of a given version

URI pattern contains an additional placeholder for the ver.It's up to the myEquivalents clients to either:

omit the version (ie, last version is always assumed, even upon ver. increase)specify a given version (requires manual version update)

Possibly: keep history of all versions

That's been ignored so farcause we're assuming one version ↔ one accession ↔ one URIand leaving versioning fun to the repositories

Must be addressed later

Possible scenario: Entities are identified by means of service + acc + versionNew version relations are added (has-version, is-prior-version, has-next-version)It is still one URI ↔ one entity at the level of a given version

URI pattern contains an additional placeholder for the ver.It's up to the myEquivalents clients to either:

omit the version (ie, last version is always assumed, even upon ver. increase)specify a given version (requires manual version update)

Possibly: keep history of all versions

Page 23: myEquivalents, aka a new cross-reference service

23

That'sThat'sall!all!

ThankThank

You!You!

Have a look at the code and the wiki (on-going work!):Have a look at the code and the wiki (on-going work!):

http://github.com/EBIBioSamples/myequivalentshttp://github.com/EBIBioSamples/myequivalents