A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock [email protected].
-
Upload
shon-barrett -
Category
Documents
-
view
218 -
download
0
Transcript of A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock [email protected].
![Page 2: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/2.jpg)
A/WWW Enterprises 2
Who is MCNC/CNIDR?
MCNC = Microelectronics Consortium of North Carolina
CNIDR = Clearinghouse for Networked Information Discovery and Retrieval Originally funded by NSF to coordinate
and produce network information tools Now developing public domain and
commercial search/retrieval tools
![Page 3: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/3.jpg)
A/WWW Enterprises 3
What is Isearch?
Isearch is the successor to freeWAIS Isearch is a sophisticated full-text
search and retrieval system Isearch is a component of Isite, an
implementation of the NISO standard protocol Z39.50 for information search and retrieval
ftp://ftp.cnidr.org/pub/NIDR.tools/Isearch http://vinca.cnidr.org/software/Isearch/Isearch.html
![Page 4: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/4.jpg)
A/WWW Enterprises 4
Terminology - I
Client/server - an architecture to allow communications between programs, possibly on different computers
Protocol - the communication “language” used by client and server programs
http - the protocol used by WWW clients and servers
CGI - mechanism to process WWW forms
![Page 5: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/5.jpg)
A/WWW Enterprises 5
Terminology - II
Query - user-supplied search criteria Full-text search - word-based search of
all the text in a document Fielded search - word-based search of
text within only certain fields in a document
Z39.50 - a standard protocol for network-based document search and retrieval
![Page 6: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/6.jpg)
A/WWW Enterprises 6
System Components - I
Iindex, the Text Indexer - builds searchable version of the document collection Implements fast word-based searching Document parser - recognize start/end
of individual documents Field parser - recognize start/end of
fields within individual documents
![Page 7: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/7.jpg)
A/WWW Enterprises 7
System Components - II
Isearch, the Search engine - searches a document collection based on user-supplied query Command line search
Primarily used for testing WWW gateway (using CGI)
End-user interface using forms Z39.50 gateway
![Page 8: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/8.jpg)
A/WWW Enterprises 8
Isearch Capabilities
Fast full-text search US AIDS Patent Collection - can search
~250,000 patents in < 1 second Fielded search
Can restrict searches to title, author, abstract, other fields
Relevance ranking Search “hits” are assigned scores &
sorted
![Page 9: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/9.jpg)
A/WWW Enterprises 9
Isearch Capabilities
Word truncation search for “matri*” matches “matrix”
and “matrices” Boolean functions
AND, OR and ANDNOT combinations of different fields
Customized presentation of results Phrase searching (coming soon)
![Page 10: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/10.jpg)
A/WWW Enterprises 10
Isearch Customization
What’s needed to customize Isearch? Isearch is written in C++ Documents are C++ objects - data &
procedures Already have SGML & HTML, among others
Object technology allows code reusability, customizing only where differences from existing objects occur
![Page 11: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/11.jpg)
A/WWW Enterprises 11
Isearch Customization
What’s needed to make arbitrary documents searchable? Code to parse documents Code to parse fields Code to build brief and full result
records Yes, it requires programming But, many of these are derived from
existing procedures
![Page 12: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/12.jpg)
A/WWW Enterprises 12
Customization Example - Linear Algebra
Inputs SGML-tagged bibliographic records TEX preprints
Requirements Field searching on title, author,
abstract Full-text search of preprints WWW-based interface
![Page 13: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/13.jpg)
A/WWW Enterprises 13
Customization Example - Linear Algebra
End products HTML-tagged “brief records” - title,
author and links to full bibliographic records and preprints
HTML formatted bibliographic records for display in WWW browser
Preprints for display or retrieval to local storage
![Page 14: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/14.jpg)
A/WWW Enterprises 14
Customization Example - Linear Algebra
Sample Bibliographic Record<BB><AID>####</AID><VOL>##</VOL><ISS>##</ISS><ATL>Title text</ATL><AUG> <AU>Author Name</AU> </AUG><ABS>Abstract text</ABS><PPX>Preprint.filename</PPX><PGR>###-###</PGR></BB>
![Page 15: A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net.](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649e2a5503460f94b189b5/html5/thumbnails/15.jpg)
A/WWW Enterprises 15
Customization Example - Linear Algebra
Isearch Modifications ~1 week coding and testing, mostly
in developing presentation customizations
Additional work to develop ingest and on-the-fly formatting scripts, code deployment at ESI
Now have basic code to handle SGML documents using Elsevier DTD