Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C....
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C....
Metasearch Technologies: Definitions, Issues, Reference
Applications William H. Mischo & Mary C. Schlembach [email protected] [email protected] Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign
Session 2441: Federated SearchingASEE 2004 National ConferenceJune 22, 2004
Outline• Distributed, heterogeneous repositories
require federation and linking.• Metasearch definitions.• Technologies.• UIUC RFP process.• Issues and trends.• Expanding use of metasearch.• Custom reference and search applications.
Distributed Information Environment• We live in a world of multiple, heterogeneous information
resources. – OPACs: local, regional, national shared.– Locally mounted and remote A & I Services.– Discrete publisher and vendor full-text repositories.– E-Resource registries: Serial solutions, TDNet.– OAI search services (OAIster, NSDL) and preprint servers.– Web search engines.– Vertical publisher and vendor portals (ARL Portal, DOE
Information Bridge, Elsevier Scirus & Scopus, EI Village, BioMed Central, Public Library of Science). Surface Web and Hidden Web.
– Institutional Repositories (D-Space).– Instructional (course) management systems (WebCT, Blackboard).
• David Seaman: ‘we don’t shelve by publisher, why do we expect users to search by publisher.’
Metasearch as a Solution
• Distributed, heterogeneous resources and repositories require federation and linking.
• Terminology: Metasearch, parallel search, federated search, broadcast search, cross-database search, simultaneous search, search portal.
• Defined by allowing search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at once.
• NISO Metasearch Initiative: emerging standards and best practices.
Value of Metasearch (Pro)• To recommend and rank specific information
resources for users; to facilitate search over multiple information resources.
• “Metasearching and other means of unifying search across heterogeneous products…most significant trend.”
• Deployment of algorithmic searches that mimic the behavior of reference librarian.
• Integration of E-resources, Local Link Resolvers, and Metasearch.
Value of Metasearch (caveats)• Metasearch vendor: “metasearch does
not provide the robust search functionality of native interfaces.” Thesauri browse, e-mailing large (non-displayed) sets.
• LJ editorial 4/1/04, “Do we want or need metasearching?” Content, Limiting, full-text links, Used Education, Boolean, Thesauri.
• NISO Workshop: Local Link resolution a mission-critical application; metasearch not yet.
Nomenclature• Metasearch also used to refer to
systems that search a multiple number of previously crawled Web search engines, such as Google, AlltheWeb, AltaVista.
• Examples: EZ2Find, Vivisimo, Dogpile, Kartoo.
• In our world, refers to systems that work over the distributed information environment predominated by bibliographic resources.
Federated vs. Broadcast Searching
• Federated: heterogeneous information resources are imported or “harvested” (sometimes using OAI-PMH protocols) into a local, central site and the normalized results are placed into a homogeneous database system for search and discovery.
• EI Village, ISI Web of Knowledge, OAISTER, Grainger OAI Search Service, NSDL.
Federated vs. Broadcast Searching
• Broadcast: user search arguments are sent asynchronously (all at the same time) to remote, distributed systems and the search results are collected, normalized, and displayed to the user.
• MetaLib, EnCompass for Resource Access, WebFeat.
• Not mutually exclusive. Can do broadcast over federated systems.
Broadcast Search Basic Technologies
• Z39.50
• HTTP “screen-scraping”
• XML gateway and Web Services.
• Proprietary APIs.
MetaSearch Implementations
• Ex Libris MetaLib.• Endeavor EnCompass.• Innovative Interfaces MetaFind.• MuseGlobal MuseSearch• EI Village; ISI Web of Knowledge.• WebFeat.• California Digital Library SearchLight system.• Fretwell-Downing.• Locals (NCSU, Grainger Library, Los Alamos).
Retrieval Issues• Pass-through to native interface at point
of search departure.• Coupling of metasearch records with
Local Link Resolvers.– Providing OpenURL enabled links to full-
text, other services.
• Merging and De-Duplication.– Partial de-duping of sequentially retrieved
sets.
• Pulling over already extant full-text links from vendor systems.
Technology Issues• Consortium-based implementations.
• Search Statistics (COUNTER compliance).
• Vendor concerns with supporting multiple metasearch sessions – throwing a logoff to kill a session.
• Search query standards – SRW/SRU, XQuery, OpenURL, one-step URL-launch searches.
Future and Custom Applications• Time of rapid development and growth
in Metasearch applications. Expect continuing evolution.
• Metasearch technology fairly easy to implement locally over selected resources.
• Focusing on apps that allow custom Best-Match and algorithmic searching that mimics reference librarian.
Our Approach• User interface and discovery systems that
emphasize function or needs-based approaches to retrieval. Reference and Known-item.
• Metasearch technologies that offer additional opportunities beyond simultaneous search of discrete A & I Services.
• Performing multiple searches within individual resources to determine “Best-Match” search results. Combined with selected simultaneous search of other resources.
UIUC Examples• Conference (Paper) Search:
– Multiple searches within OPAC for held conference proceeding + EI Village for specific paper and OCLC Conference Papers. Failed conference search presents similar journal articles.
• Journal Finder– searching e-resource registry (based on TDNet),
local serial databases, two different OPAC searches for holdings. Searches CrossRef for DOI full-text link.
• Used in training reference staff and assisting in patron point-of-need services.
Features• Performing multiple searches within a specific
resource in order to arrive at the optimum result set.
• Interpret the user-entered search argument and then route the query to selected resources: ACM, IEEE.
• Takes user-entered title search string and checks against an abbreviation database at the title and word level. Stop words in OPAC.
• Search results presented as they are returned or having the aggregate results interpreted and presented with accompanying explanations.