Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C....

20
Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C. Schlembach [email protected] [email protected] Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign Session 2441: Federated Searching ASEE 2004 National Conference June 22, 2004
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C....

Metasearch Technologies: Definitions, Issues, Reference

Applications William H. Mischo & Mary C. Schlembach [email protected] [email protected] Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign

Session 2441: Federated SearchingASEE 2004 National ConferenceJune 22, 2004

Outline• Distributed, heterogeneous repositories

require federation and linking.• Metasearch definitions.• Technologies.• UIUC RFP process.• Issues and trends.• Expanding use of metasearch.• Custom reference and search applications.

Distributed Information Environment• We live in a world of multiple, heterogeneous information

resources. – OPACs: local, regional, national shared.– Locally mounted and remote A & I Services.– Discrete publisher and vendor full-text repositories.– E-Resource registries: Serial solutions, TDNet.– OAI search services (OAIster, NSDL) and preprint servers.– Web search engines.– Vertical publisher and vendor portals (ARL Portal, DOE

Information Bridge, Elsevier Scirus & Scopus, EI Village, BioMed Central, Public Library of Science). Surface Web and Hidden Web.

– Institutional Repositories (D-Space).– Instructional (course) management systems (WebCT, Blackboard).

• David Seaman: ‘we don’t shelve by publisher, why do we expect users to search by publisher.’

Metasearch as a Solution

• Distributed, heterogeneous resources and repositories require federation and linking.

• Terminology: Metasearch, parallel search, federated search, broadcast search, cross-database search, simultaneous search, search portal.

• Defined by allowing search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at once.

• NISO Metasearch Initiative: emerging standards and best practices.

Value of Metasearch (Pro)• To recommend and rank specific information

resources for users; to facilitate search over multiple information resources.

• “Metasearching and other means of unifying search across heterogeneous products…most significant trend.”

• Deployment of algorithmic searches that mimic the behavior of reference librarian.

• Integration of E-resources, Local Link Resolvers, and Metasearch.

Value of Metasearch (caveats)• Metasearch vendor: “metasearch does

not provide the robust search functionality of native interfaces.” Thesauri browse, e-mailing large (non-displayed) sets.

• LJ editorial 4/1/04, “Do we want or need metasearching?” Content, Limiting, full-text links, Used Education, Boolean, Thesauri.

• NISO Workshop: Local Link resolution a mission-critical application; metasearch not yet.

Nomenclature• Metasearch also used to refer to

systems that search a multiple number of previously crawled Web search engines, such as Google, AlltheWeb, AltaVista.

• Examples: EZ2Find, Vivisimo, Dogpile, Kartoo.

• In our world, refers to systems that work over the distributed information environment predominated by bibliographic resources.

Federated vs. Broadcast Searching

• Federated: heterogeneous information resources are imported or “harvested” (sometimes using OAI-PMH protocols) into a local, central site and the normalized results are placed into a homogeneous database system for search and discovery.

• EI Village, ISI Web of Knowledge, OAISTER, Grainger OAI Search Service, NSDL.

Federated vs. Broadcast Searching

• Broadcast: user search arguments are sent asynchronously (all at the same time) to remote, distributed systems and the search results are collected, normalized, and displayed to the user.

• MetaLib, EnCompass for Resource Access, WebFeat.

• Not mutually exclusive. Can do broadcast over federated systems.

Broadcast Search Basic Technologies

• Z39.50

• HTTP “screen-scraping”

• XML gateway and Web Services.

• Proprietary APIs.

MetaSearch Implementations

• Ex Libris MetaLib.• Endeavor EnCompass.• Innovative Interfaces MetaFind.• MuseGlobal MuseSearch• EI Village; ISI Web of Knowledge.• WebFeat.• California Digital Library SearchLight system.• Fretwell-Downing.• Locals (NCSU, Grainger Library, Los Alamos).

Retrieval Issues• Pass-through to native interface at point

of search departure.• Coupling of metasearch records with

Local Link Resolvers.– Providing OpenURL enabled links to full-

text, other services.

• Merging and De-Duplication.– Partial de-duping of sequentially retrieved

sets.

• Pulling over already extant full-text links from vendor systems.

Technology Issues• Consortium-based implementations.

• Search Statistics (COUNTER compliance).

• Vendor concerns with supporting multiple metasearch sessions – throwing a logoff to kill a session.

• Search query standards – SRW/SRU, XQuery, OpenURL, one-step URL-launch searches.

Future and Custom Applications• Time of rapid development and growth

in Metasearch applications. Expect continuing evolution.

• Metasearch technology fairly easy to implement locally over selected resources.

• Focusing on apps that allow custom Best-Match and algorithmic searching that mimics reference librarian.

Our Approach• User interface and discovery systems that

emphasize function or needs-based approaches to retrieval. Reference and Known-item.

• Metasearch technologies that offer additional opportunities beyond simultaneous search of discrete A & I Services.

• Performing multiple searches within individual resources to determine “Best-Match” search results. Combined with selected simultaneous search of other resources.

UIUC Examples• Conference (Paper) Search:

– Multiple searches within OPAC for held conference proceeding + EI Village for specific paper and OCLC Conference Papers. Failed conference search presents similar journal articles.

• Journal Finder– searching e-resource registry (based on TDNet),

local serial databases, two different OPAC searches for holdings. Searches CrossRef for DOI full-text link.

• Used in training reference staff and assisting in patron point-of-need services.

Features• Performing multiple searches within a specific

resource in order to arrive at the optimum result set.

• Interpret the user-entered search argument and then route the query to selected resources: ACM, IEEE.

• Takes user-entered title search string and checks against an abbreviation database at the title and word level. Stop words in OPAC.

• Search results presented as they are returned or having the aggregate results interpreted and presented with accompanying explanations.