The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises [email protected].

15
The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises http://www.clark.net/pub/warnock/ awww.html [email protected]

Transcript of The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises [email protected].

Page 1: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

The Future of Isite - Growing GILS

Archie WarnockA/WWW Enterprises

http://www.clark.net/pub/warnock/awww.html

[email protected]

Page 2: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

What Is Isite?

Isite is a standards-based Internet toolkit for information search and retrieval (Z39.50)

Isite was developed by MCNC/CNIDR Isite was intended as a replacement for

freeWAIS Funded by a US NSF grant There are other good Z39.50 toolkits, too

Page 3: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Isite Architecture

Isite is written in C++ to utilize the usual object-oriented advantages

Major componentsIsearch - the search and retrieval engineSAPI - the Z39.50 search engine APIZdist - the Z39.50 implementation

Page 4: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Isite Architecture - Example Programs Iindex, Isearch, Iutil - the search engine Isearch-cgi - the CGI gateway to Isearch zclient, izclient, zping, zbatch - the

Z39.50 clients zserver, zserverNT - the Z39.50 servers zcon & zgate - the WWW-to-Z39.50

gateway

Page 5: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Current Status of Isite

MCNC/CNIDR funding from NSF is finishedSuccessful completion of 3 year grantJim Fullton, PI, is now at WIPO in GenevaNo additional support is anticipated

Other projects are supporting customizationFGDC, US Dept. of Commerce, US Patent &

Trademark Office, CEO, STScI, World Bank, BSn

Page 6: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Isite Strengths

Powerful and flexible search engine Community-based development of a reference

implementation Freely distributed and widely available for any

use Source code included Powerful search engine interface Ported to Windows NT with threaded Z39.50

server

Page 7: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Isearch Features

Full text search

Search on text fields

Search on numeric fields with appropriate relations (>, <, =)

Search on date fields with appropriate relations (before, during, after)

Search on geospatial bounding box

Boolean searches

Phrase searching

Right truncation

Proximity searching (within N characters)

Case insensitive searching, punctuation ignored

Configurable stopword list

Customizable results presentation

Relevance ranked scores

Term weighting

Page 8: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Isearch Document Types

ASCII text USMARC records Electronic mail

folders Usenet news archives US patents IAFA templates BIBTeX Filenames

First line in file SGML tagged fields

HTML GILS templates FGDC templates

Colon delimited fields GCMD DIF templates

whois++ templates Multi-file documents Medline

Page 9: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Isite Weaknesses

Modest Z39.50 implementationneeds GRS-1better USMARC

supportdata structures

All examples are console applications

No real end-user applications

No GUI interface Difficult configuration Requires

programming for extensions

Needs optimization & performance enhancement

Needs more documentation

Page 10: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

What The Future Holds For Isite

New Projects want (and will get):Distributed document collectionsDistributed searchingAutomated information extraction

(centroids, templates)Searching and referralsAdditional Z39.50 support (lots of Z39.50

details are not supported now)

Page 11: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

GILS and the Advanced Search Facility ASF is a US Dept. of Commerce project,

to be built by Pilot Research, MCNC and A/WWW Enterprises

“GILSnet” - a network of cooperative, low-impact, distributed nodes

The basic interchange will be GILS templates

Search on full text and GILS records

Page 12: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

GILS, Dublin Core and Everyone Else Dublin Core is a minimal (15 fields) generic metadata

scheme for virtually any kind of document GILS represents a more detailed approach, including

most of DC, providing greater interoperability GILS is less bibliographically oriented than BIB-1 GILS is lightweight compared to GEO and CIP (which

have specific functional requirements

Page 13: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

What GILS Means To Me -1

Fewer fieldsMore documentsMore metadata

recordsSkinnier metadata

recordsEasier abstraction

More fieldsFewer documentsFewer metadata

recordsFatter metadata

recordsLess abstraction

GILS is a good, general compromise

Page 14: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

What GILS Means To Me - 2

Think of the GILS profile as defining a language At some level, Z39.50 is a detail Protocols are about communication, profiles are about

abstraction and GILS is about content Z39.50 guarantees that the user’s query can be

unambiguously decoded - no guarantees about content We could implement the profile over any protocol - http,

CORBA, etc.

• Does GILS have to use Z39.50? No, but the abstraction is required Z39.50 already includes the abstraction model

Page 15: The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises  warnock@clark.net.

Related Documents

Getting Isiteftp://ftp.cnidr.org/pub/software/Isiteftp://ftp.clark.net/pub/warnock/Software (pre)

A/WWW [email protected]://www.clark.net/pub/warnock/awww.htmlUS Phone/FAX: 301-854-2987