Lecture 5 A research perspective on Digital Libraries
description
Transcript of Lecture 5 A research perspective on Digital Libraries
![Page 1: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/1.jpg)
1 herbert van de sompel
CS 502 Computing Methods for Digital
Libraries
Cornell University – Computer ScienceHerbert Van de [email protected]
Lecture 5 A research perspective on Digital Libraries
![Page 2: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/2.jpg)
2 herbert van de sompel
DL Ancestry
1992 1993 1994
UCSTRI
CS-TR NCSTRLWATERS
LTRS(TRSkit)
NTRS
Still operational, but no longerbeing developed.
Has also branched into manysub-fields of Physics, as well asMathematics and Chemistry.
STELAR ADSOther databases spun off (Physics /Geophysics, Space Instrumentation)
Current Status
Still In Use
Still In Use
Still In Use
Still In Use
Still In Use
Still In Use
Still In Use
Figure 3: Digital Library System Ancestry
1991
CORE
DLI
Physics e-PrintServer
![Page 3: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/3.jpg)
3 herbert van de sompel
URLs to some of these DLs
ADS: http://adswww.harvard.edu/NCSTRL: http://www.ncstrl.orgUCSTRI: http://www.cs.indiana.edu:800/cstr/cover.htmlarXiv: http://arXiv.orgLTRS: http://techreports.larc.nasa.gov/ltrs/NTRS: http://techreports.larc.nasa.gov/cgi-bin/NTRS
![Page 4: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/4.jpg)
4 herbert van de sompel
DL Architectural Review
Assumptions made in this perspective– things start with TCP/IP connectivity– distribute full content (reports, software, etc.)
• not only metadata
![Page 5: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/5.jpg)
5 herbert van de sompel
DL Architecture History approach 1
1. Build special client and server (generally using Motif/X11, Tcl/Tk, etc.), and use TCP/IP as the transport protocol only• pros: rich functionality• cons: high development cost, client distribution
problem• observation: many of these projects spent more
time building the interfaces, protocols, searching, etc. than populating their DL!
![Page 6: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/6.jpg)
6 herbert van de sompel
DL Architecture History approach 2
2. use standard protocols built upon TCP/IP: SMTP, FTP, Gopher, WAIS, HTTP, etc.• con: less functionality (restricted by protocol)• pros: less development cost, uses commonly
available clients• observation: this approach is now the most
common• The ones listed on slide 2 fit into this category
![Page 7: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/7.jpg)
7 herbert van de sompel
Early TCP/IP DLs
a very old one: IETF:http://www.ietf.org/
• Internet RFC’s
• Very first TCP/IP DL?
![Page 8: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/8.jpg)
8 herbert van de sompel
Early TCP/IP DLs
• Netlib– http://www.netlib.org/– begun in 1985, distributing mathematical
software via e-mail (SMTP)– other access methods and protocols added (ftp,
X11 client, http)
![Page 9: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/9.jpg)
9 herbert van de sompel
Netlib 1995
![Page 10: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/10.jpg)
10 herbert van de sompel
Netlib 2001
![Page 11: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/11.jpg)
11 herbert van de sompel
Los Alamos arXiv
• Physics pre-print server– http://xxx.lanl.gov/ == http://arXiv.org– begun in 1991 as an e-mail service to exchange
TeX source of pre-prints in high energy physics– ftp, http access added shortly– Now THE communication channel in Physics– Paul Ginsparg
![Page 12: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/12.jpg)
12 herbert van de sompel
Characteristics of early TCP/IP, non-HTTP DLs
• Useful – could get the “thing” that you were looking for
• Constrained by transport protocol– SMTP, FTP, etc. interface inherently “clunky”– Higher level services such as searching,
sophisticated browsing, etc. difficult to implement• Small scale
– would the same systems work well if the holdings went from 100’s or 1000’s to millions?
![Page 13: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/13.jpg)
13 herbert van de sompel
Characteristics of early TCP/IP, HTTP DLs
• Initial HTTP implementations / conversions pretty much provided incremental steps in DL improvement– a “nice” ftp interface, maybe with better
searching and browsing – but the nature of the DLs changed little
• LTRS is an example of a http DL that is really: FTP+Searching(WAIS)+Browsing
• http://techreports.larc.nasa.gov/ltrs/• Also check out user interface of http://arXiv.org
![Page 14: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/14.jpg)
14 herbert van de sompel
Early TCP/IP, HTTP DLs
• But http is a very general transport protocol, and it is possible to build even higher level protocols on top of it
• Combine this with the expressive HTTP client (web browser), and there is a lot of potential
• Dienst– (http://www.ncstrl.org/Dienst/htdocs/Info/
protocol4.html)– builds an actual DL protocol on top of HTTP
• 1994 -- the first to do so?• Open Archives Initiative: metadata harvesting
protocol on top of HTTP
![Page 15: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/15.jpg)
15 herbert van de sompel
Sophistication increases, tracks meet
ftp / gopher
httpLTRS, e-print, Netlib, etc.
httpDienst
sophistication
time
research track
library automation track
![Page 16: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/16.jpg)
16 herbert van de sompel
A Framework for Distributed Digital Object Services
Kahn/Wilensky Framework [Kahn 1995]• 1995• A high level document• Almost a definition of key concepts, terminologies, …
for next generation DLs• Foundation for a research discipline?• Not detailed enough to be a real architecture. • Architecture is independent of the type of data
stored in the DL
![Page 17: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/17.jpg)
17 herbert van de sompel
KWF: key terms
• digital object (do)– A do is a data structure that contains
• Digital data; data is typed (cf MIME)• Persistent Key Metadata; especially handle• Other metadata (for instance Terms and
Conditions)• handle
– a handle is a unique, persistent name for a do• repository
– The place where do’s live– Has unique global name
• Repository Access Protocol (RAP)– To deposit/access do’s in repositories
![Page 18: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/18.jpg)
18 herbert van de sompel
KWF: flow
Originator
digital object
makes a Data
which consists of
Key-Metadata• handle
handle comesfrom a handlegenerator
Handle Server
which registers the do’s handle with a handle server
at which point the do becomesa registered do
Accesses/Deposits the do in repositories by means of the Repository Access Protocol
What the client receives as a result of an access to a do is a dissemination.
client
Properties record per do
• Key metadata: handle• Other metadata:
• Terms and conditions
Transaction record per do
Repository
which can go in a repository
at which point the do becomes a stored do
![Page 19: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/19.jpg)
19 herbert van de sompel
Digital objects
• do = data + key-metadata– data is typed; core types include:
• bit-sequence / set-of-bit-sequences• digital-object / set-of-digital-objects• handle / set-of-handles
– other types can be defined, and registered with a global type registry• definition and registration left undefined• ~ similar to MIME
– key-metadata includes handle– possibly other metadata (left undefined in KWF)
![Page 20: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/20.jpg)
20 herbert van de sompel
Digital objects
• Composite do’s:– a do with data of type digital-object– non-composite do’s are elemental do’s– composite do’s can – for instance -- be used to
collect similar works together• composite do than contains a do for each work
of Shakespeare...
![Page 21: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/21.jpg)
21 herbert van de sompel
Changing digital objects
• Mutable do’s can be changed once placed in a repository– key-metadata cannot be changed – the do’s handle does never change!
• Immutable do’s cannot be changed once placed in a repository– however, they can be deleted
![Page 22: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/22.jpg)
22 herbert van de sompel
Handles
• Guest lecture by Professor Arms 02/19
![Page 23: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/23.jpg)
23 herbert van de sompel
Repositories
• A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval
• A stored do is a do that resides in a repository• A registered do is a do that the repository has
registered with a handle server– storing and registering can be the same or
different processes
![Page 24: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/24.jpg)
24 herbert van de sompel
Repositories
• A repository keeps a properties record for each do– contains key-metadata and any other metadata
the repository chooses to keep• A do may have a transaction record associated with
it in a repository
![Page 25: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/25.jpg)
25 herbert van de sompel
Repository Access Protocol
• “Protocol” may be misleading, its really just the concept for a protocol
• RAP is designed to be simple; higher level services should come from other protocols
• KWF defines 3 basic operation classes:– ACCESS_DO [metadata; key-metadata, digital object]
• A dissemination of a do is the result of a request to access a do
– DEPOSIT_DO [metadata; key-metadata, digital object]– ACCESS_REF
• this is a means to tell the world about other ways (protocols) to access do’s in the repository.
![Page 26: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/26.jpg)
26 herbert van de sompel
Terms and Conditions
• TC are attached to:– each do– each dissemination– each repository
• TC are a precondition for any operation on the above• Repositories responsible for enforcing TC
![Page 27: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/27.jpg)
27 herbert van de sompel
Terms and Conditions
repositoryterms and conditions
terms and conditions
terms and conditions
digital object
dissemination
data data
1 1
1
11
1
1
1
1
1
1
1
1
N
Figure 1 from 95 TR-1593
![Page 28: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/28.jpg)
28 herbert van de sompel
Digital Objects: Terms and Conditions
• Set by originator and/or repository
• Can be arbitrarily complex, but generally consist of:
– permissions: read, write, etc.
– authentication - person, group, etc.
– payment
– 3rd party intervention (possibly in support of the above)
![Page 29: Lecture 5 A research perspective on Digital Libraries](https://reader033.fdocuments.in/reader033/viewer/2022051417/568148ba550346895db5d489/html5/thumbnails/29.jpg)
29 herbert van de sompel
Readings
• Kahn, R. & Wilensky, R. 1995. A Framework for Distributed Digital Object Services
http://WWW.CNRI.Reston.VA.US/home/cstr/arch/k-w.html
• Arms, W.Y. 1995. Key Concepts in the Architecture of the Digital Library. In: D-Lib Magazine. http://www.dlib.org/dlib/July95/07arms.html
• Marc VanHeyningen. 1994. The Unified Computer Science Technical Report Index: Lessons in indexing diverse resources. http://www.cs.indiana.edu/ucstri/paper/paper.html