ETD Search Services Ming Luo Edward A. Fox ([email protected]) Virginia Tech.

33
ETD Search Services Ming Luo Edward A. Fox ([email protected]) Virginia Tech

Transcript of ETD Search Services Ming Luo Edward A. Fox ([email protected]) Virginia Tech.

ETD Search Services

Ming Luo

Edward A. Fox ([email protected])

Virginia Tech

Acknowledgements (selected)

Support: Adobe, AOL, DFG, NSF (DUE-0333531,0136690,0121679; IIS-0086227,0080748 ), OCLC, UNESCO, VTLS

Colleagues: Vinod Chachra, Tom Dehn, Marcos Gonçalves, Thom Hickey, Aaron Krowne, Ming Luo, Gail McMillan, Hussein Suleman, Jeff Young

Where are the data coming from?

From this community! Please join us, share your data! OCLC NDLTD Union

– 112,652 ETDs metadata– 44 institutions– Will be able to provide 3 formats

DC ETDMS MARC21

Where are the data coming from?

OCLC (Research) contacts are– Thom Hickey [[email protected]]– Jeff Young [[email protected]]– Tom Dehn [[email protected]]

VTLS has some additional data sources– Providing data other than through OAI-PMH– Including in Korean and Greek– Contact is Vinod Chachra [[email protected]]

Institutions in OCLC NDLTD Union Set (1)

Institutions in OCLC NDLTD Union Set (2)

OCLC SRU

What is SRU?– Search and Retrieve URL Service (SRU) is web

service based protocols for searching databases– Derived from Z39.50 – Uses Common Query Language– Current version: V1.1, 13th February 2004– Three basic operations, explain, scan and searchRetrieve  

CQL Examples(from http://www.loc.gov/z3950/agency/zing/cql/)

dc.title cql.stem

dc.title = "cat" cat

dc.title = "cat" author = "smith"

dc.title any "cat" bath.author cql.exact "smith, j."

dc.title any/relevant/rel.CORI "cat fish" dc.author exact/stem "smith, j."

dc.title = cat "<element>"

dc.title = "cat" and bath.author = "smith" " cat" or hat

dc.title = "cat" prox/distance=1/unit=word dc.title = "in""cat" prox/distance>2/ordered "hat"

dc.title=cat and/rel.sum dc.title=dog > dc="http://www.dublincore.org/" dc.title = "cat"

OCLC SRU Interface

VTLS Search

Based on Virtua system from– VTLS (www.vtls.com)– Visionary Technology in Library Solutions

Developed in C++ Uses Oracle Database

Virtua User Interface

Scan Search Key Word Search Expert Search

VTLS Union CatalogContent Languages

The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish

Examples follow

Language = German; hits = 137

Full record display

Virginia Tech ETD Union

Componentized Digital Library Software Uses OCLC’s OAI data provider Mirrored in China by CALIS About 200 queries and 400 pages per

day for the past year and usage is increasing

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

VT ETD Union System Diagram

BrowseSearch What’s New

OCLC data provider

User Interface

Open Digital Library Protocol

Extended OAI-PMH

Protocol for Metadata Harvesting

Open Digital Library Component

Extended OPEN ARCHIVE

OPENARCHIVE

Open Digital Library Components

Running now– XML-File (data provider from file system)– Search: simple or in-memory (Essex) or generalized– Union, browse, recent, filter– E-journal/review, Submit, Edit, Annotation– Recommender, Rating; Mirroring (see JCDL’02)– Working with NCSA: from DB, unstructured text

Others in process– Classification/categorization– Registry (and other connections with web services)

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

open digital library

OA OA

OA

OA

OA

OA

OA

OA

OA

PMH

PMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

ETD-1

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

ETD-2

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

ETD-3

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

ETD-4

ETD DL for the Networked Digital Library of Theses and Dissertations

(www.ndltd.org)

Search

Filter

Filter

Union

Recent

Browse

PMH

PMH

PMH

ODLRecent

ODLBrowse

ODLUnion

ODLUnion

ODLSearch

ODLUnionPMH

PMH

US

ER

INT

ER

FA

CE

Students and researchers ETD collections

Example Open Digital Library

ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)

Quality of Search Services

  Composability Efficiency Effectiveness

OCLC SRU Medium High High

VTLS Low High Medium

Virginia Tech ETD Union Search

High Medium Medium

Comparison of Software

  Software Used Software License

Price of Software

Full Text Search

OCLC SRU Homegrown N/A N/A No

VTLS VIRTUA(VTLS.com)

Commercial Depends on user number, collection size

No

Virginia Tech ETD Union Search

Open Digital Library

BSD-like Open Source License

Free No

Virginia TechETD collection

Ultraseek(Verity.com)

Commercial Depends on user number, collection size

Yes

Next Steps with VT ETD Union

Web Services based component

Easier user interface configuration

Better precision of search results; full-text?

Research studies (e.g., Ryan Richardson dissertation)– Studies of collections and genre– Summaries using concept maps– Cross-language retrieval

References:

Z39.50 International - Next Generation: http://www.loc.gov/z3950/agency/zing

VT Service: http://rocky.dlib.vt.edu/~etdunion/cgi-bin/OCLCUnion/UI/index.pl

VTLS Service: http://www.vtls.com/ndltd OCLC Service (SRU):

http://alcme.oclc.org/ndltd/SearchbySru.html

Thank You!

Paper with more details is available at URL: http://tennessee.cc.vt.edu/~lming/software/ETDSearchServices0.7.doc

DLRL: www.dlib.vt.edu, http://dlbox.nudl.org/

Fox: http://fox.cs.vt.edu