Exploring IR Technologies
description
Transcript of Exploring IR Technologies
Last revised: 23 February 2006Last revised: 23 February 2006
Exploring IR TechnologiesExploring IR Technologies
Ki Tat LAMHead of Library Systems
The Hong Kong University of Science and Technology [email protected]
IR WorkshopManaging Scholarly Assets in Institutional Repositories:
Sharing Experiences Among JULAC Libraries
24 February 2006, HKUST Library
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 22
ContentsContents
DSpace SoftwareDSpace Software SRW/U, Usage statistics, OpenURLSRW/U, Usage statistics, OpenURL
Cross-Searching TechnologiesCross-Searching Technologies Search engines – GoogleSearch engines – Google OAI-PMH - OAIster, Scirus, HKIROAI-PMH - OAIster, Scirus, HKIR
HKIRHKIR StandardizationStandardization
• Author names; subjects; document types; metadata schemaAuthor names; subjects; document types; metadata schema Document deposition versus linking Document deposition versus linking Research Assessment ExerciseResearch Assessment Exercise
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 33
DSpace SoftwareDSpace Software
Jointly created by MIT Libraries and Hewlett-Jointly created by MIT Libraries and Hewlett-Packard Company [Packard Company [http://www.dspace.org/]http://www.dspace.org/]
Open source software – released since 2002Open source software – released since 2002
Adopted by HKUST Library for its IR since Adopted by HKUST Library for its IR since February 2003 February 2003 [http://repository.ust.hk/][http://repository.ust.hk/]
Also adopted for HKUST’s Digital University Also adopted for HKUST’s Digital University Archives – migrated to DSpace in October 2004 Archives – migrated to DSpace in October 2004 [http://archives.ust.hk/][http://archives.ust.hk/]
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 44
DSpace Software DSpace Software [cont.][cont.]
HKUST’s Electronic Journals Online searching HKUST’s Electronic Journals Online searching service will soon be migrated to DSpace service will soon be migrated to DSpace [http://lbapps.ust.hk/ej/][http://lbapps.ust.hk/ej/]
Adopted by CUHK for its IR (known as SiR) Adopted by CUHK for its IR (known as SiR) since mid-2004 since mid-2004 [http://dspace.lib.cuhk.edu.hk/][http://dspace.lib.cuhk.edu.hk/]
Adopted by CityU for its IR since 2005 Adopted by CityU for its IR since 2005 [http://dspace.cityu.edu.hk/][http://dspace.cityu.edu.hk/]
Will be adopted by HKIEd for building its IRWill be adopted by HKIEd for building its IR
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 55
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 66
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 77
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 88
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 99
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1010
IR Software and ServicesIR Software and Services
Open Source SoftwareOpen Source Software DSpaceDSpace GNU EPrintsGNU EPrints FedoraFedora See See OSI Guide to Institutional Repository OSI Guide to Institutional Repository
SoftwareSoftware [[http://www.soros.org/openaccess/software/]http://www.soros.org/openaccess/software/]
Commercial SoftwareCommercial Software VITAL from VTLS Inc. – powered by FedoraVITAL from VTLS Inc. – powered by Fedora DigiTool from Ex LibrisDigiTool from Ex Libris Symposia from Innovative Interface Inc.Symposia from Innovative Interface Inc.
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1111
IR Software and Services IR Software and Services [cont.][cont.]
Commercial Hosting ServicesCommercial Hosting Services Digital Commons from ProQuest – powered Digital Commons from ProQuest – powered
by the bepress platformby the bepress platform
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1212
DSpace at HKUSTDSpace at HKUST
As of 19 February 2006,As of 19 February 2006,
Home URL: Home URL: http://http://repository.ust.hkrepository.ust.hk//IR Software:IR Software: DSpace Version 1.3.2DSpace Version 1.3.2System Software:System Software: Fedora Core 4 Linux; Tomcat 5.0;Fedora Core 4 Linux; Tomcat 5.0;
JDK1.4.2JDK1.4.2Server:Server: Intel Pentium 4 3GHz; 3GB RAM;Intel Pentium 4 3GHz; 3GB RAM;
80GB hard disk80GB hard diskContent:Content: 2231 documents from 42 departments2231 documents from 42 departmentsUsages:Usages: Documents were accessed Documents were accessed
74,467 times since October 200474,467 times since October 2004
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1313
DSpace at HKUSTDSpace at HKUST
CustomizationsCustomizations Document submission formDocument submission form Add item formAdd item form CJK supportCJK support Authentication and authorizationAuthentication and authorization SRW/U interfaceSRW/U interface Collection and Usage statisticsCollection and Usage statistics OpenURL linkingOpenURL linking
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1414
DSpace at HKUST DSpace at HKUST [cont.][cont.]
SRW/U InterfaceSRW/U Interface SSearch and earch and RRetrieval for the etrieval for the WWeb (or by eb (or by UURL)RL) Base URL: [Base URL: [http://http://repository.ust.hk/SRW/search/DSpacerepository.ust.hk/SRW/search/DSpace]]
Alternative way of searching the repository - Alternative way of searching the repository - using standard web servicesusing standard web services
Allows search service providers to issue a Allows search service providers to issue a federated search to various IRs and deliver federated search to various IRs and deliver the search results in their own GUI interfacethe search results in their own GUI interface
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1515
Response to the following SRW search request:
http://repository.ust.hk/SRW/search/DSpace?query=dc.creator+%3D+%22ip+nancy%22&operation=searchRetrieve&maximumRecords=1&startRecord=1...
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1616
XSLT-converted response to the following SRW search request:
http://repository.ust.hk/SRW/search/DSpace?query=dc.creator+%3D+%22ip+nancy%22&operation=searchRetrieve&maximumRecords=1&startRecord=1...
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1717
DSpace at HKUST DSpace at HKUST [cont.][cont.]
Size of the RepositorySize of the Repository[[http://http://repository.ust.hk/dspace/dbstat.jsprepository.ust.hk/dspace/dbstat.jsp]]
Compiles in real time the number of items, Compiles in real time the number of items, collections and communities in the Repositorycollections and communities in the Repository
Top 20 Most Access DocumentsTop 20 Most Access Documents[[http://repository.ust.hk/dspace/top20.jsphttp://repository.ust.hk/dspace/top20.jsp]]
Compiled every month against the Tomcat Compiled every month against the Tomcat web access logsweb access logs
Excludes access by most robotsExcludes access by most robots
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1818
DSpace at HKUST DSpace at HKUST [cont.][cont.]
OpenURLOpenURL All documents deposited in the HKUST IR All documents deposited in the HKUST IR
must meet the open access criterionmust meet the open access criterion Two solutions to link to non-open access Two solutions to link to non-open access
documents were explored:documents were explored:• Direct linking to the documents as found in the Direct linking to the documents as found in the
library subscribed databaseslibrary subscribed databases• OpenURL for Link ResolverOpenURL for Link Resolver
OpenURL approach was adopted because:OpenURL approach was adopted because:• More persistent than vendor-provided URLsMore persistent than vendor-provided URLs• Transparent to what databases locally subscribedTransparent to what databases locally subscribed
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 1919
DSpace at HKUST DSpace at HKUST [cont.][cont.]
One disadvantage of the OpenURL approach One disadvantage of the OpenURL approach – what if the in-house link resolver fails to find – what if the in-house link resolver fails to find a target link? e.g.a target link? e.g.• Host of the document is not OpenURL capableHost of the document is not OpenURL capable• Database not subscribed by the libraryDatabase not subscribed by the library• Target not profiled by the local link resolverTarget not profiled by the local link resolver
Developed a data entry interface to assist in Developed a data entry interface to assist in the construction of OpenURL the construction of OpenURL
Demonstration:Demonstration:• SampleSample item with OpenURL item with OpenURL• Staff interface for OpenURL constructionStaff interface for OpenURL construction
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2020
Click on this image to launch HKUST’s WebBridge link resolverto locate the published version
Documentdepositedin the Repositoryis apre-published version
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2121
Click on this link to retrieve the article hosted on Elsevier’s ScienceDirect platform
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2222
Click on this link to view the full-text of this
article
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2323
OpenURL constructed
View ItemEdit Item
Build OpenURL
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2424
Click this button to create this OpenURL fragment
Click this link to test the OpenURL
Check INNOPAC for bib record and then auto-insert the ISSNs to the form
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2525
Cross-Searching IRsCross-Searching IRs
Cross-searching approachesCross-searching approaches If the IR site is open for robot access, If the IR site is open for robot access,
documents are very likely available in major documents are very likely available in major search enginessearch engines, such as Google and Yahoo., such as Google and Yahoo.
Indexing services harvest IR metadata using Indexing services harvest IR metadata using OAI-PMHOAI-PMH protocol: protocol:• OAIster from University of Michigan OAIster from University of Michigan
[http://oaister.umdl.umich.edu/][http://oaister.umdl.umich.edu/]
• Scirus from Elsevier [Scirus from Elsevier [http://www.scirus.com/http://www.scirus.com/]]• HKIR – an experimental system by HKUST Library HKIR – an experimental system by HKUST Library
[http://lbapps.ust.hk/hkir/][http://lbapps.ust.hk/hkir/]
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2626
Documentindexed byGoogle
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2727
Document indexed by Google Scholar
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2828
Document indexed by OAIster
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 2929
Click this link tosearch HKUSTIR on Scirus
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3030
Scirus search results page will look like this
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3131
Cross-Searching IRs Cross-Searching IRs [cont.][cont.]
OAI-PMHOAI-PMH A A pprotocol developed by rotocol developed by OOpen pen AAccess ccess
IInitiative for nitiative for hharvesting arvesting mmetadata from etadata from distributed repositoriesdistributed repositories
Most of the IR software, including DSpace, Most of the IR software, including DSpace, are OAI-PMH capableare OAI-PMH capable
Indexing services such as OAIster are Indexing services such as OAIster are OAI OAI data harverstersdata harversters
IRs are IRs are OAI data providersOAI data providers
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3232
OAI-PMH “GetRecord” request by URL:http://repository.ust.hk/dspace-oai/request?verb=GetRecord& ... 1783.1/1805
OAI-PMH’s XML outputin response to a“GetRecord” request
Metadata in Unqualified Dublic Core metadata schema (oai_dc)
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3333
HKIRHKIR
HKIR - an experimental system developed by HKIR - an experimental system developed by the HKUST Library to demonstrate the features the HKUST Library to demonstrate the features of harvesting and cross-searching the scholarly of harvesting and cross-searching the scholarly and research output from the Hong Kong UGC and research output from the Hong Kong UGC funded institutions funded institutions [[http://http://lbapps.ust.hk/hkirlbapps.ust.hk/hkir//]]
Powered by the DSpace softwarePowered by the DSpace software Equipped with OCLC’s OAIHarvester2 software Equipped with OCLC’s OAIHarvester2 software
for harvesting OAI metadata from IRsfor harvesting OAI metadata from IRs
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3434
HKIR HKIR [cont.][cont.]
Databases harvested (as of 22 Feb 2006):Databases harvested (as of 22 Feb 2006): CUHK SiR [70 records]CUHK SiR [70 records] CityU Institutional Repository [425 records]CityU Institutional Repository [425 records] HKUST Electronic Theses [1,681 records]HKUST Electronic Theses [1,681 records] HKUST Institutional Repository [2,126 HKUST Institutional Repository [2,126
records]records] HKU Theses Online [13,583 records]HKU Theses Online [13,583 records]
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3535
Possible add-on to aid UGC’s research assessment exercise
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3636
This record was harvested from CUHK’s IR and it is in their Fine Arts collection
Click on this link to go to the record in CUHK’s IR
A sampleHKIRrecord
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3737
A sampleHKIRrecord showing fields labeled in qualified Dublin Core elements
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3838
HKIR supports OpenURLs harvested from local IRs
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 3939
HKIR HKIR [cont.][cont.]
Standardization IssuesStandardization Issues Author names standardizationAuthor names standardization Subject analysisSubject analysis
• Free vocabulary versus thesaurusFree vocabulary versus thesaurus• Adopt same thesaurus among institutions?Adopt same thesaurus among institutions?
Document typesDocument types• Adopt same set of definitions among institutions?Adopt same set of definitions among institutions?
Metadata schemaMetadata schema• Adopt same metadata schema?Adopt same metadata schema?• Use oai_dc schema for OAI harvesting?Use oai_dc schema for OAI harvesting?
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4040
Author name assigned by HKUST
Author name assigned by CityU
Author namesstandardization
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4141
Document type assigned to the samearticle are different
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4242
HKIR HKIR [cont.][cont.]
Problem on loading harvested oai_dc metadataProblem on loading harvested oai_dc metadata oai_dc is the most popular metadata schema oai_dc is the most popular metadata schema
used by OAI data provider tools, e.g.used by OAI data provider tools, e.g.• Virginia Tech’s VTOAI - used by HKUST and HKU Virginia Tech’s VTOAI - used by HKUST and HKU
in their Theses databasesin their Theses databases• OCLC’s OAICat - used by DSpaceOCLC’s OAICat - used by DSpace
oai_dc does not support qualified Dublin Coreoai_dc does not support qualified Dublin Core• The qualified DC fields stored in local DSpace The qualified DC fields stored in local DSpace
have to be scaled down to simple DC when have to be scaled down to simple DC when exporting records to OAI harverstersexporting records to OAI harversters
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4343
HKIR HKIR [cont.][cont.]
Mapping metadata back to qualified DC for Mapping metadata back to qualified DC for loading to HKIR is challengingloading to HKIR is challenging
Need to develop a HKIR version of schema Need to develop a HKIR version of schema that takes qualified DCthat takes qualified DC
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4444
Metadata in oai_dc schema as received by the OAI harvester
dc:dentifier.citation in local IR
dc:dentifier.uri in local IR
dc:dentifier.openurl in local IR
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4545
HKIR HKIR [cont.][cont.]
Document deposition and linkingDocument deposition and linking Deposit Deposit allall open access documents to the open access documents to the
local IRslocal IRs If published version is in restricted access, If published version is in restricted access,
then deposit the pre-published version and then deposit the pre-published version and provide a link to the published versionprovide a link to the published version
Use OpenURL for linking as long as the Use OpenURL for linking as long as the document is in a database that can be document is in a database that can be reached via link resolversreached via link resolvers
Otherwise, add the vendor-specific link to the Otherwise, add the vendor-specific link to the metadata recordmetadata record
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4646
HKIR HKIR [cont.][cont.]
Research Assessment Exercise (RAE)Research Assessment Exercise (RAE) Assess the quality of the research output of Assess the quality of the research output of
the academic staffthe academic staff Assist in assessing the research fund Assist in assessing the research fund
allocation to the funded institutionsallocation to the funded institutions UGC is conducting RAE 2006 UGC is conducting RAE 2006
[http://www.ugc.edu.hk/eng/ugc/publication/prog/rae/rae.htm][http://www.ugc.edu.hk/eng/ugc/publication/prog/rae/rae.htm]
• Each eligible academic staff submits a maximum Each eligible academic staff submits a maximum of six publicationsof six publications
• Assessed by subject panelsAssessed by subject panels
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4747
HKIR HKIR [cont.][cont.]
High potential of utilizing the cross-High potential of utilizing the cross-institutional repository to assist academic staff institutional repository to assist academic staff to submit items and prepare reportsto submit items and prepare reports• Go electronic – no longer need to collect Go electronic – no longer need to collect
submissions in printed formatsubmissions in printed format IRRA (Institutional Repositories & Research IRRA (Institutional Repositories & Research
Assessment) - a project that support RAE Assessment) - a project that support RAE through IRs, for the UK RAE in 2008 through IRs, for the UK RAE in 2008 [http://irra.eprints.org/][http://irra.eprints.org/]
• Developing software for EPrints and DSpace to Developing software for EPrints and DSpace to facilitate RAE tasksfacilitate RAE tasks
• DSpace version to be available in summer 2006DSpace version to be available in summer 2006
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4848
HKIR HKIR [cont.][cont.]
If we have a cross-institutional repository for If we have a cross-institutional repository for Hong Kong IRs, then we may consider adding Hong Kong IRs, then we may consider adding support for RAE to the systemsupport for RAE to the system• Next round of UGC RAE is in 2011or 2012Next round of UGC RAE is in 2011or 2012
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 4949
Sample screen from an IR showing users selecting items for RAE submission[source: http://irra.eprints.org/software/bronze/eprints.html]
IR Workshop – Exploring IR Technologies – K.T. Lam, HKUST LibraryIR Workshop – Exploring IR Technologies – K.T. Lam, HKUST Library 5050
Thank You!Thank You!