© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT &...
-
Upload
gabriel-brady -
Category
Documents
-
view
214 -
download
0
Transcript of © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT &...
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1
Keith G JefferyDirector, IT & International Strategy, STFC
Anne G S AssersonResearch Department
University of Bergen
INTERESTINTERoperation for Exploitation,
Science and Technology
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 2
Authors
Anne Asserson UiB
Keith G Jeffery STFC-RAL
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 3
Structure• Background• The Hypothesis
• Conclusion
• Remote Wrapper• Local Wrapper• Catalog• Catalog Plus Pull
(ERGO2++)• Full CERIF• Harvesting
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 4
Background: GL• Grey literature is important but is only a small
component of the total research information environment and must be seen in context of the overall research process
• Grey literature is a product• To understand the product need to have
information on the sources and the process i.e. the research context
Do not try to obtain information through a ‘fog’ backwards from GL metadata
Get it moving forwards through the research process then much GL metadata derived directly and consistently
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 5
Background: Access• Interoperation: homogeneous access to distributed
heterogeneous information– Query against schema (of user)– Translation to other schemas (of sources)– Answer reconciled to original schema (of user)– If common interoperation format n interfaces– If not n(n-1) interfaces
• Utilise one common interoperation format• [Character set, language, syntax, semantics]
• The alternative is ‘google-like’ where the end-user has to do the translations and reconciliations
• This does not scale
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 6
Background: Metadata• Grey literature repositories can be
interoperated without CERIF-CRIS using OAI-PMH and DC (OAISTER)
• Grey Literature Repositories provide better recall and relevance when interlinked via CERIF-CRIS – research context
• formal syntax, declared semantics• Metadata
– Schema, Navigational, Associative {descriptive, restrictive, supportive}
• The key to everything is quality metadata– input validation, query/retrieval, relationship
linking, INTEROPERATION
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 7
PROJECT
ORGUNIT
Skills
CV
GeneralFacility
ParticularEquipment
ContactResults
PublicationResultsPatentResultsProduct
Service
FundingProgramme
Event
ClassificationPrize/Award
PERSON
CERIF: EU Recommendation to Member States
Background
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 8
Result PublicationInstance Diagram
Person A
Publication X
OrgUnit O
OrgUnit M
OrgUnit N
Project P
member
member
employee
Part of
Part of
owns IPR
author
Project leader
Metadata in CERIF-CRIS much richer than usual repository
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 9
CERIF- CRIS + Repositories at 1 institution
CRISResearch Context
[projects, persons, organisational unitsfunding, products, patents, publications
facilities, equipment, events]
OA Repository(hypermedia) Documents
e-Research repositoryDatasets and Software
OAI-PMH
Various
protocols
End-User
CERIFCERIF
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 10
….and multiple institutions
CRIS
OA repository
e-Researchrepository
CRIS
OA repository
e-Researchrepository
CRIS
OA repository
e-Researchrepository
End-User End-User End-User
Institution A Institution B Institution C
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 11
Hypothesis• Comparison of possible architectures
for interoperation of grey repositories– (of publications or data and software)
• Leads inexorably to === • CERIF should be used either :
– as the native storage format, – as the storage format of a derived data
warehouse (transformed copy of the CRIS)
– as the export format converted from the CRIS native format using a wrapper.
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 12
Remote Wrapperuser
dispatcheraddresses
receiveraddresses
Query convertor
answer convertor
Queryschema
receiver dispatcheraddresses
Query form
receiver
answer convertor
Queryschema
dispatcher
network
integrationschemas
Presentation convertor
Presentation form
<<<<Non-CERIF CRISs>>>>
LAN
Query convertor
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 13
Remote Wrapper• the user needs only web browser and
simple query form
• the host has to write query converter
• the host has to write answer (XML?) converter (to a specific XML DTD?)
• the query expressivity is very limited
• the user client has to write an integrator for the answers
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 14
Local Wrapper
integration
user
Query convertor Presentation convertor
Query
schemas
Query form Presentation form
schemas
dispatcher receiveraddresses
receiver dispatcheraddresses receiver dispatcheraddresses
network
<<< non-CERIF CRISs >>>>>
LAN
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 15
Local Wrapper• each host has only to supply and update its schema to the
client (all clients if there is not a central query server)• each host has no software to provide except receiver and
dispatcher• the client (if it is a central service) has a very large
workload• if there is no central service then each client has to have
all schemas supplied and updated• the client software has to include a complex query refiner• the client software has to include multiple complex query
converters• the client software has to include a complex answer
integrator• the client software has to include a presentation converter
(complexity depends on specification of presentation required and complexity of the answer structure)
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 16
Catalog
Retrieve phase by user
User phase1
Hit list
CERIF Metadata Catalog
Query form
Query
CERIF Metadata Catalog
receiver
convertor
Query (standard)
schema
dispatcherCRIS
network
loader
Construction phase from each host
LAN
user
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 17
Catalog• simple query on union catalog (which
may be centralised or replicated)
• possibly not all required entities and attributes in catalog
• effort to populate catalog; requires converter at each host to supply CERIF metadata
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 18
Catalog Plus Pull (ERGO2++)User phase1
Hit list processing
CERIF Metadata Catalog
receiver dispatcheraddresses receiver dispatcher
addresses
network
Query form
Query
dispatcher receiveraddresses
Unique id query
Unique id query
User phase2
<<< non-CERIF CRISs >>>>>
LANPresentation form
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 19
Catalog Plus Pull (ERGO2++)• advantage of simplicity as for catalog-only
architecture• advantage of additional information
provision• disadvantage that additional information is
heterogeneous (unless converted to CERIF export data model)
• disadvantage of hosts having to maintain entries representing their database content in the CERIF metadata catalog
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 20
Full CERIFuser
dispatcher receiveraddresses
Query
receiver dispatcheraddresses
Query
receiver dispatcheraddresses
network
Query form Presentation form
<<<<< CERIF CRISs >>>>>
LAN
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 21
Full CERIF• very simple and easy to use for the
end-user
• each host has to either run a full CERIF model database or provide a full CERIF model version of the host database
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 22
Harvesting (construction phase)
Crawling robot
Catalog of documents with associative descriptive metadata
Html pages
converter converter converter converter
CRIS non-CRIF CRISs << << >>>>
CRIS CRIS CRIS
Html pages Html pages Html pages
network
network
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 23
Harvesting (search phase)User phase1
Hit list processing
HarvesterAssociative descriptive metadata catalog
receiver dispatcheraddresses receiver dispatcher
addresses
network
Query form
Query
dispatcher receiveraddresses
URL query
URL query
User phase2
Html pages from CRIS
Html pages from CRIS
network
LANPresentation form
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 24
Harvesting• The host has to provide a copy of the database as
webpages to be available to the search robot and subsequent accesses based on clicks from URL of metadata.
• The query is based on existence of term(s); constraining by entity or attribute is not possible (without sophisticated xml form processing).
• The results are unstructured and one page at a time (click on URL in metadata catalog to see page); this inhibits statistical processing or report generation.
• It is easy to implement and maintain (although the database may be ~2 weeks out of date) and has a familiar interface for many WWW users.
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 25
Conclusion
To interoperate grey repositories link to a CRIS
Best: Full CERIF architecture
Else: wrap CRIS to interoperate using CERIF