1
Symposium: Open Access to Information
Panel 3: BDTD (Biblioteca Digital deTeses e Dissertações)
25 August 2006, Brasilia
NDLTD: From Local to National to Global
http://fox.cs.vt.edu/talks/2006/20060825IBICTp3
Edward A. Fox, [email protected] Director, NDLTD
Chair, IEEE-CS Tech. Committee on Digital LibrariesProfessor, Department of Computer ScienceDirector, Digital Library Research Laboratory
Virginia Tech, Blacksburg, VA 26061 USA
2
Global Scope
• 1 stop shopping for access• Search engine companies want single
contact to reach large number of sites• Spokesman needed for publicity,
partnering, advocacy, monitoring, … • Annual international conference: next in
Sweden, then United Kingdon• Research: cross-language, classification,
preservation, plagiarism detection, …
3
Outline
• Acknowledgements
• Key Ideas – With Proofs
• ETD 2005 Concepts
• Institutional Repositories
• UK Report
• NDLTD
• DL Futures
4
Acknowledgements
• Students
• Faculty, Staff
• Collaborators
• Support
• Mentors
5
Key Ideas - Overview
• Theorem 1: Supporters of Open Access should support NDLTD.
• Theorem 2: 5S can guide us to better support of Open Access.
6
Theorem 1: Supporters of Open Access should support NDLTD - 1
• DLs will lead to enormous benefit at all levels, from personal to global.
• An IR is a type of DL, in the middle of the levels (requiring support from below, and providing support for above levels).
• Having a DL at every university (i.e., IR) greatly encourages Open Access.
7
Theorem 1: Supporters of Open Access should support NDLTD - 2
• The easiest way to launch an IR at a university is with ETDs.
• NDLTD is the lead world organization promoting ETD activities.
• NDLTD’s goals are all in support of Open Access and IRs.
8
Theorem 2: 5S can guide us to better support of Open Access - 1
• 5S helps us think formally about Open Access, hence clearly, hence to find focus.
• 5S helps us design and build DLs, hence IRs.
• Societies– Individuals: members of institution, discipline– Social influence can promote DL (re)use.– Economic and political and social issues lead us
to a distributed architecture.
9
Theorem 2: 5S can guide us to better support of Open Access - 2
• Distributed infrastructure + services lead us to harvesting (vs. federation, gathering).
• 5S helps make harvesting a success:– Streams of content flow from individuals.– Structures: ETD-ms, (browsing) classification– Spaces: indexes, interfaces– Scenarios: submission, workflow, harvesting– Societies (see above)
• More collaboration (social networks)• Prestige is more widely spread.• Access if more open
10
Conference Summary Words - 1
accessibility aggregation alert
annotate archive arts
attitudes authentication authoring
authorization automation browse
catalog collaboration community
components context conversion
customer decentralized digitize
discourse discovery dissemination
DSpace federated Fedora
global grid economic
harvesting ingest innovation
institutional integrity interaction
11
Conference Summary Words - 2
interchange interoperability knowledge
LOCKSS management metadata
national OCR organization
partnership PDF (/A) podcasting
portal preservation provider
regional repository retrieval
scalability Scirus search
server service sharing
standardization strategic student
summarization sustainable testimonial
toolkit training tutorial
Unicode usable VALET
XML XSLT workflow
12
Conference Summary Phrases - 1
alumni development always on
business model concept map
content management copyright compliance
cost effective Creative Commons
creative material cross language
dark archive developing country
digital library digital rights management
digital signature disruptive technology
document model Dublin Core
13
Conference Summary Phrases - 2
e-knowledge e-publishing
e-research e-science
full text Google Scholar
institutional repository LDAP server
learning object mandatory deposit
Million Book Project national initiative
Net Gen OAI PMH
online digital studio open access
Open Archives Initiative open source
14
Conference Summary Phrases - 3
persistent identifiers postgraduate research
public domain restricted access
retrospective conversion scholarly communication
server log service oriented architecture
social network stepping stone
subject gateway survey data
union catalog unlocking IP
user centered value added
voluntary participation walking the talk
web based web services
15
Institutional Repositories - 1
• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”
• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
16
Institutional Repositories - 2
• “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003, www.arl.org/newsltr/226/ir.html
17
Prospero: Summary of features of the three software packages compared
DSpace E-prints Fedora
What you get A package with front-end web interface directly linked to a database
A package with front-end web interface directly linked to a database
A repository database, with internal database.
Server require- ments
Unix environment, Java, Apache Ant, Apache Tomcat, PostgreSQL or Oracle
Unix environment, Perl, Apache+mod-perl, MySQL
Unix or Windows, Java. (optional: MySQL or Oracle)
Subject class- ification
Yes Yes Yes
Community groups
Yes No Possible but … (see below)
Where from? MIT and Hewlett-Packard.
Southampton University, outcome of a JISC project.
Cornell University and the University of Virginia Library.
18
UK Report of Aug. 2006
• EVALUATION OF OPTIONS FOR A UK ELECTRONIC THESIS SERVICE
• Study report edited by Alma Swan• Key Perspectives Ltd & UCL Library Services• EThOS project (Electronic Theses Online
Service) - commissioned to develop a model for a workable, sustainable and acceptable national service for the provision of open access to electronic doctoral theses.
19
EThoS: Stakeholders
• Academic registrars
• University administrators (graduate schools)
• Librarians
• Repository managers
• Authors (or potential authors) of theses and dissertations
20
EThoS: Issues
• Electronic thesis provision status in the UK and the reasons for its slow development
• Drivers for change in the provision of e-theses• The administrative and academic contexts in which
a national UK e-thesis service would need to operate
• Constraints that might apply, or which have applied until the present
• Architectures and service models for e-thesis provision
• Technical standards• IPR and other rights issues• Business models
21
Elements of an ETD Initiative - 1• The hub: the central focus of the service may offer multiple
resources and subservices or, at the other end of the scale, may be a simple resource discovery service: The hub may point at theses located in host institutions or it may contain the full-text of theses itself
• Submission procedure: the simplicity of and requirements for this can vary
• Metadata structure and format: the required metadata formats can vary from very simple Dublin Core with few elements to a deeply descriptive specially developed metadata scheme with many elements
• Metadata dissemination: services vary in the extent to which they disseminate thesis metadata – some only expose it themselves, while others disseminate it via multiple discovery services and routes
• Accepted file formats: some services accept multiple file formats, some few and some just one
22
Elements of an ETD Initiative - 2
• Digitisation: a digitisation service may be part of the offering. If it is, it may be on demand or there may be a mass retrodigitisation programme on offer
• Thesis level: services may offer only doctoral theses or may extend their coverage to masters theses and even undergraduate dissertations
• Copyright and IPR: services may incorporate advice and practical help on rights issues
• Plagiarism: services may offer a plagiarism detection scheme
• Business model: under this heading fall issues such as: whether theses are offered on a pure Open Access basis or whether the access is paid for; whether royalties are paid to authors; how digitisation costs are covered and so forth
23
E-theses Services: Summary of characteristics - 1Provider ADT DiVA Theses
CanadaNDLTD DART-
EuropeEThOS
Coverage Originally 7 in CAUL: now open to all Australian and NZ universities
16 universities in Sweden and Norway; open to all universities
60 universities in Canada
Any voluntarily participating institutions
Pan-European service for voluntary national/European consortia
Any voluntarily participating UK HE institution
Hub ADT DiVA Portal www.diva-portal.org
Theses Canada Portal
www.theses.org
DART-Europe repository and portal (DEEP) in 2007
EThOS (at British Library)
Submission procedure
At host institution
By author at host univ. Template supplied
To ProQuest to digitize, or universities can provide metadata.
At host institutions. NDLTD encourages
At host institution
By author at host univ.; central service for retrodigitis-ation and submission
Metadata format
Details soon
99 metadata elements
MARC 21ETD-msDublin Core
ETD-MS (ETD metadata standard). Crosswalks
Developing a standards w 8 DC elements
EThOS set of 15 qualified DC elements
24
E-theses Services: Summary of characteristics – 2Provider ADT DiVA Theses
CanadaNDLTD DART-
EuropeEThOS
Funded by Australian Research Council (at least initially)
Universities’ consortium (of the participating universities)
LAC funds except for ProQuest’s fee (funded by Theses Canada and universities)
Membership dues and membership in-kind contributions
Business model currently being discussed
Participating institutions, via choice of options
Digitisation provision by service?
At deposit. Also on request of theses
No By ProQuest No Yes, probably mass retrodi-gitisation if funding secured
Yes
Number of theses
N/A 11,500 theses, 500 other pubs (reports, books)
c50,000 ETDsc250,000 TDs (incl. ETDs)
Over 250,000
Estimated 25,000/yr digitisations
Masters theses
Yes? Yes Yes Yes Yes? No plans
Open Access Yes Yes 1998-2002, and to ETDs harvested
Yes Yes Yes
25
EThoS Survey: my institution’s policy position on PhD e-theses
• 55% no policies yet
• 34% current planning policies
• 11% has a policy
26
EThoS Survey: very important driver of a national e-thesis service
• 60% e-theses are more accessible
• 48% paper theses are not easily accessible
• 31% will contribute to institution’s visibility
• 21% storage space for print theses
• 19% want national support since it is slow to launch a local service
• 16% increasing interest in electronic preservation of research
27
EThoS Survey: very useful value-added services for PhD e-theses
• 59% long-term digital creation of harvested e-theses in a central archive
• 51% optimizing exposure to search engines• 48% digital copy for local host institutions• 35% IPR checks against deposited works• 35% support for non-text elements• 34% plagiarism checks• 33% link to repositories of primary data used
28
EThoS: Benefits
• Hugely increased visibility of UK doctoral research output
• Resulting in increased usage and impact of UK doctoral research output
• The opportunities for resulting new research efforts and collaborations
29
EThoS: Opportunities
• Being able to provide a world-class electronic theses service to showcase the UK’s doctoral research
• Providing an example of good practice and the impetus for other nations to develop electronic theses services of their own
• Possible commercial opportunities for value-added service providers
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org
31
NDLTD Incorporation
• Incorporated May 20, 2003 in Virginia, USA• Charitable and educational purposes (501 c 3)• Officers
– Executive Director (Ed Fox)– Secretary (Gail McMillan)– Treasurer (Scott Eldredge)
• Now:– 250K metadata records in Union Catalog– ~50 full members, ~200 associated members
32
Board of Directors (2006)• Suzie Allard (ETD 2004, U. Kentucky)• Denise A. D. Bedford (World Bank)• Julia C. Blixrud (ARL, SPARC)• José Luis Borbinha (Natl Lib Portugal)• Alex Byrne (ETD 2005, ADT: Australia)• Tony Cargnelutti (ETD 2005, Australia)• Vinod Chachra (VTLS)• William Clark (Ohio State U.)• Susan Copeland (RGU, UK)• Jude Edminster (Bowling Green St. U.)• Scott Eldredge (Treasurer, ETD 2002, BYU)• Edward A. Fox (Exec Director,Virginia Tech)• John H. Hagen (West Virginia U.)• Thomas B. Hickey (OCLC)
• Christine Jewell (U. Waterloo, Canada)• Joan K. Lippincott (CNI)• Mike Looney (Adobe)• Austin McLean (ProQuest)• Gail McMillan (Secretary, Virginia Tech)• Joseph Moxley (ETD 2000, USF)• Eva Müller (U. Uppsala, Sweden)• Ana Pavani (PUC Rio, Brazil)• Sharon Reeves (National Library
Canada)• Peter Schirmbacher (ETD 2003,
Humboldt)• Hussein Suleman (U.Cape Town, S.
Africa)• Shalini R. Urs (U. Mysore, India)• Eric F. Van de Velde (ETD 2001,
Caltech)
33
NDLTD Committees (Chairs)• Awards (John Hagen)• Conferences (Sharon Reeves)• Development (Peter Schirmbacher)• Executive (Edward Fox)• Finance (Scott Eldredge)• Implementation (Ana Pavani)• Membership (Tony Cargnelutti)• Nominating (Joan Lippincott)• Standards (Thomas B. Hickey)• Union Catalog (Vinod Chachra)
34
Selected Projects / Sponsors
• Australia (ADT)• Brazil (BDT, IBICT)• Canada• Catalunya• Chile (Cybertesis)• China (CALIS)• Germany• India (Vidyanidhi)• Korea
• OhioLINK: 79 colleges/univs
• Portugal (National Library)
• South Africa• UK (British Library,
JISC, Edinburgh, …)• UNESCO (especially
Latin America, Eastern Europe, Africa)
• …
35
UNESCO and ETDs(by Axel Plathe at ETD2003)
• Promoting the use of the Internet as a tool for disseminating scientific knowledge
• Facilitating the transfer of ETD expertise from developed to developing countries
• 1998: Member of the NDLTD Steering Committee• 1999: First UNESCO ETD meeting on ETD
internationalisation
• 2002: “UNESCO Guide to Electronic Theses and Dissertations”
• 2003: Model training programmes and training courses• 2003: Sponsor pilot projects• 2003: Pilot projects (Africa, Europe, Latin-America)
36
Some Countries• Australia• Belgium• Brazil• Canada• Chile• China, Hong Kong• Columbia• Finland• France• Germany• Greece• India• Italy• Jamaica• Korea• Lithuania• Malaysia• Mexico
• Namibia• Netherlands• Norway• Poland• Russia• Singapore• S. Africa• S. Korea• Spain• Sudan• Sweden• Switzerland• Taiwan• Thailand• Turkey• UK• USA• Venezuela• Yugoslavia
37
NDLTD Members - 1Ball State University
Brigham Young University
California Institute of Tech.
Consorci de Biblioteques Universitàries de Catalunya
Duke University
Georg August Universität Göttingen
George Washington University
Georgetown University
Georgia Institute of Technology
Georgia Southern University
Georgia State University
Government of Canada
Griffith University
John Hopkins University
Kauno Technologijos Universitetas
Louisiana State University
L'Université du Québec à Rimouski
McGill University
New Jersey Institute of Technology
North Carolina Central University
North Carolina State
Ohio University
38
NDLTD Members - 2Oregon State U. Library
Penn State University
Pontifícia Universidade Católica do Rio de Janeiro
Portugal National Library
Rita Chu (individual)
Simon Fraser University
State of Kansas
Texas Tech University
U. de las Américas, Puebla
Universität St. Gallen
U. Glasgow
U. Maine
U. Missouri
U. North Carolina Chapel Hill
U. Pittsburgh
U. Pretoria
U. Southern Florida
U. Tennessee
U. Waterloo
Virginia Tech
West Virginia U. Libraries
Worcester Polytechnic Institute
Yale University
39
Why ETD? Short Answer
• For Students:– Gain knowledge and skills for the Information Age– Richer communication (digital information, multimedia, …)
• For Universities: – Easy way to enter the digital library field and benefit
thereby
• For the World: – Global digital library – large, useful, many services
• General:– Save time and money– Increased visibility for all associated with research results
40
NDLTD: How can a university get involved?
• Select planning/implementation team– Graduate School– Library– Computing / Information Technology– Institutional Research / Educ. Tech.
• Join online, give us contact names– www.ndltd.org/join
• Adapt Virginia Tech or other proven approach– Build interest and consensus– Start trial / allow optional submission
42
How? Steps
• Attend ETD xx• Join NDLTD• Launch initiative, dialog, encourage• Pilot -> requirement• OAI data provider• DINI-Certificate• Log, survey, analyze, improve• Help other sites• Serve on NDLTD committees• Extend services: preservation, inst. rep., …
43
Union catalog: OCLC
• OCLC will expand OAI data provider on TDs.
• Is getting data from WorldCat (so, from many sites!).
• Will harvest from all others who contact them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
44
OCLC SRU Interface
45
VTLS
• VTLS offers its free VALET system to manage ETDs at institutions, building upon Fedora, as well as VTLS software.
• VTLS runs a service provider atop the Union Catalog. It supports multilingual access through the interface, to metadata.
46
LOCKSS
• Lots of copies keep stuff safe
• Stanford (Vicky Reich)
• Initial focus on lower levels
• Initial content: journals
• Emory (Martin Halbert)– Help deploy and adapt– Help apply in other contexts, e.g., ETDs
• Experiments, studies of Int’l ETD service– Humboldt, PUC Rio, U. Cape Town, VT, …
47
Full-text Services
• Running since Sept 2005: Scirus• In beta test: Google Scholar• Challenges:
– Broadening the coverage since OAI use has not spread as widely as we would like
– Understanding use, throughout life cycle– Data and DL services quality problems– Inconsistency in way to get from metadata to the full-
text file(s)– Cross-language information retrieval
48
NDLTD cross-language problem
Language NumberEnglish 123,696
Portuguese 11434
German 4131
French 3868
Spanish 1561
Chinese 1463
Catalan 804
Others 19962 (most unclassified)
Total 166919 (summer’05)
49
Ryan Richardson solution to NDLTD cross-language problem
50
Example concept map
51
Supply-Demand ComparisonETD Resources and User Demands (Number of Queries) in NDLTD
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
1 2 3 4 5 6 7 8
Academic Categories
ETDs Demands
1 Architecture and Design
2 Law
3 Medicine, Nursing and Veterinary Medicine
4 Arts and Science
5 Engineering and Applied Science
6 Business and Commerce
7 Education
8 Others. (unclassifiable)
52
User Expertise YearsUsers' Expertise in Years
0
20
40
60
80
100
120
140
160
180
200
Years
Use
rs
53
Date Stamp of ETD
0
10,000
20,000
30,000
40,000
50,000
60,000
Year
54
Quality DimensionsDL Concept Dimensions of Quality Digital object Accessibility
Pertinence Preservability Relevance Similarity Significance Timeliness
Metadata specification Accuracy Completeness Conformance
Collection Completeness Impact Factor
Catalog Completeness Consistency
Repository Completeness Consistency
Services Composability Efficiency Effectiveness Extensibility Reusability Reliability
55
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Significance
Similarity
Pertinence
AccuracyCompletenessConformance
Seeking
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Accessibility
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Preservability
Describing
Quality and the Information Life Cycle
56
Metadata Specifications and Metadata Format: Completeness
• OCLC NDLTD Union catalog
00. 10. 20. 30. 40. 50. 60. 70. 80. 9
1
GWUD LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAND
ERBI
LT
NCSU
USAS
K
PITT HKU
HUMB
OLT
OCLC
BGMY
U
DRES
DEN
VIEN
NA
GATE
CH
ETSU USF
MUEN
CHEN
UTEN
N
CCSD
WATE
RLOO
NSYS
U
LAVA
L
UPSA
LLA
CALT
ECH
UCL
WagU
niv
57
Metadata Specifications and Metadata Format: Conformance
• Based on ETD-MS
0. 75
0. 8
0. 85
0. 9
0. 95
1
GW
UD
LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAN
DER
BILT
NC
SU
USA
SK
PITT HKU
HU
MBO
LT
OC
LC
BGM
YU
DR
ESD
EN
VIEN
NA
GAT
ECH
ETSU
USF
MU
ENC
HEN
UTE
NN
CC
SD
WAT
ERLO
O
NSY
SU
LAVA
L
UPS
ALLA
CAL
TEC
H
UC
L
Wag
Uni
v
58
DL Futures
• History
• People, Content, Tools
• Sustainable Infrastructure
• For More Information
59
60
61
People
• Digital librarians
• DL system developers
• DL system administrators
• DL managers
• DL collection development staff
• DL evaluators
• DL users
62
63
As data, information, and knowledge play increasingly central roles … digital library
research should focus on:
• Increasing the scope and scale of information resources and services;
• Employing context at the individual, community, and societal levels to improve performance;
• Developing algorithms and strategies for transforming data into actionable information;
• Demonstrating the integration of information spaces into everyday life; and
• Improving availability, accessibility, and, thereby, productivity.
64
An appropriate infrastructure program will provide sustainability of digital knowledge
resources among five dimensions:
• Acquisition of new information resources;• Effective access mechanisms that span
media type, mode, and language;• Facilities to leverage the utilization of
humankind’s knowledge resources;• Assured stewardship over humanity’s
scholarly and cultural legacy; and• Efficient and accountable management
of systems, services, and resources.
65
DLs: For More Information• Magazine: www.dlib.org• Books: http://fox.cs.vt.edu/DLSB.html (1994)
– MIT Press: Arms, plus by Borgman, Licklider (1965)– Morgan Kaufmann: Witten... (several), Lesk (2nd edition)
• Conferences– ECDL: www.ecdl2005.org– ICADL: http://icadl2004.sjtu.edu.cn– JCDL: www.jcdl2005.org
• Associations– ASIS&T DL SIG– IEEE TCDL: www.ieee-tcdl.org (student awards, doctoral
consortia)• NSF: www.dli2.nsf.gov• Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/
Top Related