DIGITAL LIBRARIESgreenstonesupport.iimk.ac.in/greenstone2010/pdf/... · Foreword Digital Libraries...
Transcript of DIGITAL LIBRARIESgreenstonesupport.iimk.ac.in/greenstone2010/pdf/... · Foreword Digital Libraries...
Feasibility, Features, Functionalities and the Future
Dr. M.G. Dr. M.G. SreekumarSreekumarUNESCO Coordinator, Greenstone Support, South AsiaUNESCO Coordinator, Greenstone Support, South Asia
Librarian & Head, CDDL, IIM Librarian & Head, CDDL, IIM KozhikodeKozhikode
DIGITAL LIBRARIES
Agenda• Digital Library – Concepts, Principles and
Technologies, Architecture…• Open (Source) Digital Libraries • Metadata – Concepts, Functions and Standards• DL : Functional Components, Workflows & Procedures• DL : Build up Strategies• Hardware / Storage / Space• Software Selection• DL Architecture• Major Tasks• DL Hardships
ForewordForewordDigital Libraries Digital Libraries gaining increasing social attention, academic gaining increasing social attention, academic and research interestand research interestDemand for improved information and knowledge Demand for improved information and knowledge management solutions management solutions -- universities, enterprises and universities, enterprises and institutionsinstitutionsNeed for Integrated access to disparate information resourcesNeed for Integrated access to disparate information resourcesKey challenge Key challenge -- how to create online information how to create online information environments facilitating internal content publishing and environments facilitating internal content publishing and single point access to internal/external information sources single point access to internal/external information sources Latest DL technologies Latest DL technologies VsVs Traditional libraries and Traditional libraries and knowledge managementknowledge managementFortunately we have a large number of operational digital Fortunately we have a large number of operational digital libraries and serviceslibraries and services
World of Digital Information :World of Digital Information :FeaturesFeatures
Great Potential and DynamicGreat Potential and DynamicEasy to access, disseminate, store, retrieve, archive, Easy to access, disseminate, store, retrieve, archive, copy, transmit ...copy, transmit ...Ubiquity of the Net / WebUbiquity of the Net / WebInformation Information -- Any time / Anywhere / AnyoneAny time / Anywhere / AnyoneAccess by a wide spectrum of UsersAccess by a wide spectrum of UsersEasiness of access Easiness of access -- Plug & PlayPlug & PlayCurrency of the material / informationCurrency of the material / informationIncrease in value Increase in value
Unique Features of the Net/WebUnique Features of the Net/Web
Reach Reach -- unprecedentedunprecedented
Richness Richness -- unquestionedunquestioned
Feedback Feedback -- excellent excellent
Content HolderContent Holder
Content PublisherContent Publisher
Content CommunicatorContent Communicator
AsynchronousAsynchronous
Death of Distance / TimeDeath of Distance / Time
Technology Requirements
BandwidthCommunication SpeedsProcessing PowerWorld Wide ConnectivityApplication Support
The Current EnvironmentThe Current EnvironmentFascinating times in the history of libraries, Fascinating times in the history of libraries, information systems and electronic publishinginformation systems and electronic publishing
Possibilities of building largePossibilities of building large--scale services scale services
Materials are stored on computers Materials are stored on computers
Network connects the computers to personal computers Network connects the computers to personal computers on the users' deskson the users' desks
In a complete digital library, nothing need ever reach In a complete digital library, nothing need ever reach paper paper
Top Tech Trends in IT / LISTop Tech Trends in IT / LIS
Web 2.0 / Library 2.0Web 2.0 / Library 2.0Blogs / RSS Feeds / Wikis / Podcasts / WebcastsBlogs / RSS Feeds / Wikis / Podcasts / WebcastsOpen Source Software, Open Standards, Open URL Open Source Software, Open Standards, Open URL User Tagging, Automated TaggingUser Tagging, Automated TaggingWeb Web OPACsOPACs, and Interface Design, and Interface DesignSeamless Integration / AggregationSeamless Integration / AggregationOA OA --> OAP + OAA > OAP + OAA Open Resource Discovery Tools Open Resource Discovery Tools -- Google ScholarGoogle ScholarEE--Books, EBooks, E--Journals, EJournals, E--ResourcesResourcesHarvesting, Federation, Harvesting, Federation, MetasearchingMetasearchingDigital Rights ManagementDigital Rights Management
Multimedia Library Info System
Multimedia Library Info System
Internet / IntranetInternet / Intranet
Gateway-out Data capture
USER @ anywhere (access to information from anywhere)
Challenges of the DayChallenges of the DayCollection Building Collection Building –– Acquisition, Subscriptions, Acquisition, Subscriptions, LicensingLicensing……
Diverse Diverse DatastreamsDatastreams -- Content Categories, Publication Content Categories, Publication TypesTypes
Multimedia, Multimedia, PolymediaPolymedia, , MultiformatsMultiformats
Copyright, Intellectual Property, Fair UseCopyright, Intellectual Property, Fair Use……
Technology Complexities, Infrastructure IssuesTechnology Complexities, Infrastructure Issues
PublishersPublishers’’ Stringent Policies / MonopoliesStringent Policies / Monopolies
Integration of legacy systems and the new genreIntegration of legacy systems and the new genre
Popular InformationPopular
Information
Scholarly InformationScholarly
Information
DigitizedInformation
(DL Initiatives)
DigitizedInformation
(DL Initiatives)
Web Resources
Web Resources
The InformationLandscape
The InformationLandscape
Books, eBooksPOD, JLs, eJLs,
NewspapersAV media
Books, eBooksPOD, JLs, eJLs,
NewspapersAV media
Books, eBooks, JLS, eJournals, Scholarly
Articles, ePrint Archives,ETDs, eCourses
Books, eBooks, JLS, eJournals, Scholarly
Articles, ePrint Archives,ETDs, eCourses
Commercial,National,
State & Local LevelNGOs
Commercial,National,
State & Local LevelNGOs
Surface Web,Deep Web,
Multi-ModalSemantic Web
Surface Web,Deep Web,
Multi-ModalSemantic Web
Penetration of E-Content in Libraries
PUBLICATION TYPES
• E-Books, E-Journals…
• Aggregated Scholarly E-Journal Databases
• Databases, CBT/ WBT
• Portals, Vortals…
• Value added services
• Preprints, Eprints, E-Documents….
DOCUMENT FORMATS
• ASCII, RTF, HTML, SGML, Postscript, PDF, Proprietary, Native Application Formats
• Images, Graphics
• Audio
• Video
• XHTML, ASP, PHP, XML ...
WhatWhat’’s a DL ? s a DL ? "Digital libraries are organized collections of digital informat"Digital libraries are organized collections of digital information. They ion. They combine the structuring and gathering of information, which librcombine the structuring and gathering of information, which libraries and aries and archives have always done, with the digital representation that archives have always done, with the digital representation that computers computers have made possible." (have made possible." (Michael Michael LeskLesk) ) ““Is a managed collection of information, with associated servicesIs a managed collection of information, with associated services, where , where the information is stored in digital formats and accessible overthe information is stored in digital formats and accessible over a network. a network. A crucial part of this definition is that the information is manA crucial part of this definition is that the information is managed. A aged. A stream of data sent to earth from a satellite is not a library. stream of data sent to earth from a satellite is not a library. The same data, The same data, when organized systematically, becomes a digital library collectwhen organized systematically, becomes a digital library collection." ion." ((William ArmsWilliam Arms) ) Digital library is "a focused collection of digital objects, incDigital library is "a focused collection of digital objects, including text, luding text, video, and audio, along with methods for access and retrieval, avideo, and audio, along with methods for access and retrieval, and for nd for selection, organization, and maintenance of the collection." selection, organization, and maintenance of the collection." ((Ian Witten and David BainbridgeIan Witten and David Bainbridge).)."Digital libraries are different [from traditional library autom"Digital libraries are different [from traditional library automation] in that ation] in that they are designed to support the creation, maintenance, managemethey are designed to support the creation, maintenance, management, nt, access to, and preservation of digital content. access to, and preservation of digital content. (Bernie Hurley,(Bernie Hurley, the Director the Director for Library Technologies at for Library Technologies at U.C.BerkeleyU.C.Berkeley. Quoted in . Quoted in Digital library technology Digital library technology trendstrends. Sun Microsystems. August 2002) . Sun Microsystems. August 2002)
What is a “digital library”?
Traditional user/librarian distinction is blurredComputers make information activeKitchens for knowledge preparationWWW ≠ DL!—organization, selectivityNice Web site ≠ DL!—import new documents easily
Collection of digital objects (text, video, audio) along with methods for access and retrieval, [user]and for selection, organization, and maintenance [lib]
Ian Witten
Digital Libraries as Digital Libraries as ‘‘CollectionsCollections’’
Digital Libraries as Digital Libraries as ‘‘InstitutionsInstitutions’’
Digital libraries are organizations that provide the Digital libraries are organizations that provide the resources, including the specialized staff, towards resources, including the specialized staff, towards building and operating building and operating DLsDLs
Digital libraries as a dynamic, growing organismDigital libraries as a dynamic, growing organism
Digital libraries evolve and become the predominant Digital libraries evolve and become the predominant mode of access to knowledge and learning, mode of access to knowledge and learning, institutionalization of digital libraries appears to be an institutionalization of digital libraries appears to be an increasing possibilityincreasing possibility
Benefits of Benefits of DLsDLsOutreach Outreach -- Library goes to the user Library goes to the user
Seamless Access Seamless Access -- Searching and browsing Searching and browsing
Borderless Dissemination Borderless Dissemination
Instantaneous and Current Instantaneous and Current
Always (24X7) available Always (24X7) available
Long term preservationLong term preservation……
LimitationsLimitations of of DLsDLsTechnological obsolescence Technological obsolescence
HardwareHardwareSoftwareSoftware
Quite Tender and hence Fragile tooQuite Tender and hence Fragile tooSecurity Issues Security Issues –– Being rigorously addressedBeing rigorously addressedHighly sensitive to Commands Highly sensitive to Commands –– Even a small Even a small ignorance or carelessness could be very fatal at timesignorance or carelessness could be very fatal at timesResources, Cost, ManpowerResources, Cost, ManpowerBandwidthBandwidthRights ManagementRights Management……
Functional ComponentsFunctional Components
Creation of Creation of DLsDLs
Digital ObjectsDigital Objects
Digital ObjectsDigital ObjectsDigital objects of analogue/physical equivalents:Digital objects of analogue/physical equivalents: pictures, pictures, video clips, music, publications, maps, artifacts (e.g. museum video clips, music, publications, maps, artifacts (e.g. museum objects), living beings (plants, animals, people), animation's, objects), living beings (plants, animals, people), animation's, slide shows, print publications, etc. In case of some of these slide shows, print publications, etc. In case of some of these entities (for example, artifacts like buildings and museum entities (for example, artifacts like buildings and museum objects and living beings) digital objects may only carry objects and living beings) digital objects may only carry relevant metadata information and possibly some form of relevant metadata information and possibly some form of multimedia representation of the entity (e.g. photographs). multimedia representation of the entity (e.g. photographs). Digital objects that do not have physical counterparts and Digital objects that do not have physical counterparts and those created dynamically and in realthose created dynamically and in real--time:time: electronic electronic publications, software, spread sheets, databases, data gathered publications, software, spread sheets, databases, data gathered from remote sensors, software agents, and live capture of from remote sensors, software agents, and live capture of digital versions of speech, music and video. digital versions of speech, music and video.
Space Requirements: For 100,000 Space Requirements: For 100,000 Articles (Text) having 5 pages eachArticles (Text) having 5 pages each
Space Requirements: For 100,000 Space Requirements: For 100,000 Images (640X480 in 256 Images (640X480 in 256 colourscolours))
Space Requirements: For 100,000 Audio Space Requirements: For 100,000 Audio Recordings (Half Sound, 8 Bit 11 KHzRecordings (Half Sound, 8 Bit 11 KHz-- Mono Mono
and 16 Bit 44 KHz Stereo, 10 and 16 Bit 44 KHz Stereo, 10 MinsMins each)each)
Space Requirements: For 100,000 Video Space Requirements: For 100,000 Video Clips (320X200 and 256 Clips (320X200 and 256 colourscolours at 15 fps)at 15 fps)
Bandwidth RequirementsBandwidth Requirements
Libraries Libraries -- ShiftsShifts
Traditional / AutomatedTraditional / AutomatedOrganization is physical Organization is physical Shelving of documents Shelving of documents -- Based on Subject Based on Subject ClnClnKey Key -- Index / Catalogues / Cards / Digital Catalogs Index / Catalogues / Cards / Digital Catalogs Cards Cards -- Real/Virtual Real/Virtual -- Author, Title, DescriptionsAuthor, Title, Descriptions
DigitalDigitalOrganization in terms of digital files /objectsOrganization in terms of digital files /objectsContains material digitized formContains material digitized formContains digital materialContains digital materialArchitecture Architecture Key Key -- MetadataMetadata
Shift in ApproachesShift in ApproachesTraditional Automated Dig. Library
AACR2ISO 2709CCFMARCThesauri
AACR2CCCCC / LCCSDDC / UDCThesauri/LCSH
MetadataDCMI -- W3CEAD, TEI, DTDMETS,MODS, Z39.50MARC21OAI-PMH
Limited/ RigidEfficient/ Flexible
Improved
Features of Digital Libraries…
• Dynamic Electronic Information Systems• Seamless Aggregation and Integration of Scholarly
Content• Create / Maintain Local Content• Strengthens - mechanisms and capacity - Information
Systems / Services• Increase Portability• Efficiency of Access• Flexibility• Availability• Long term preservation
UNESCO
Special Requirements
• Infrastructure• Acceptability• Access Restrictions• Readability• Standardization• Authentication• Preservation• Copyright• User Interface
Need for Content Integration / Organization
• Assuring Seamless Access to the Content • Need for a single Info. Gateway / Access Point • Multi - Formats, Media, Platforms (Content / Data
in different formats)• Data encoding (role of markup languages)• Role of Metadata (role of Standards)• Structured Metadata (role of XML)• Need for Interoperability• Interface / Delivery / Presentation• Exorbitant cost of proprietary DL S/W
Digital Library Technologies
• Open architectures (Open DLs)
• Componentized vs Monolithic systems
• Interoperability (role of Z39.50, OAI etc.)
• Unified interface for heterogeneous libraries
• Metadata mapping across different libraries
• OAI-compliant data and service providers
• Multilingual digital libraries
• Scalable digital library architectures
• Publication tools
• Searching tools
Software Selection• Goals and Requirement Specification
• Proprietary Vs Open Source
• Fit the existing Information System
• Accommodate future migration
• Embrace all possible/predominant formats
• Support standard DL technologies/platforms
• Easy installation, population, maintenance
• Comprehensive Documentation
• Software Development Team
• Active User Groups, E-Mail Lists (Users / Developers)
What Distinguishes a DL?
Site Neutrality (3 in 1 Access-Anytime,Anywhere by Anyone Access)
Open AccessGreater variety and granularity of informationSharing of information ‘Sharium’Up-to-date nessAlways available (365*7*24)New forms of rendering (New Genre)
Digital Libraries: An Overview
Digital Libraries
Computing Networking Content Collections Services Community
What are digital libraries for?Knowledge/content management
Manage and access internal information assetsScholarly communication, education, research
E-journals, e-prints, e-books, data sets, e-learningAccess to cultural collections
Cultural, heritage, historical & special collections, museums, biodiversity
E-governanceImproved access to government policies, plans, procedures, rules and regulations
Archiving and preservationMany more …
DL Software: Alternatives
What are your expectations?Develop local web-based application?Commercial DL solution?Adopt open source software?
GreenstoneEprintsDSpaceFedora…
Digital Library TechnologiesDigital Library Technologies
Interoperability Interoperability
Unified interface for heterogeneous libraries Unified interface for heterogeneous libraries
Metadata mapping across different libraries Metadata mapping across different libraries
OAIOAI--compliant data and service providers compliant data and service providers
Multilingual digital libraries Multilingual digital libraries
Scalable digital library architectures Scalable digital library architectures
Publication toolsPublication tools
Searching toolsSearching tools
DLs: Workflows and Processes
Content selectionContent acquisitionContent publishing
Metadata preparationContent loading
Content indexing & storageContent access & delivery
PreservationAccess managementUsage monitoring and evaluationNetworking and interoperationMaintenance
DL Software: Key requirements• Document types (book, journal article, lecture …)• Document formats (text, PDF, Word, PS, …)• Content acquisition (online and offline)
– Metadata description, content tagging– Content uploading
• Indexing and retrieval– Structured/ full text indexing– Automatic metadata extraction
• Storage– Data compression– Efficient storage for metadata– Efficient location of metadata and documents
• Access and delivery– Structured search, browse, hierarchical browsing– CD-ROM distribution
DL Software: More requirements
• Scaling up – for large collections• Multilingual support• Access management and security• Usage monitoring and reporting• Standards compliance
– XML, Dublin Core, Unicode• Interoperation
– OAI, Z39.50 compliance, MARC, CDS/ISIS, …
Traditional Library Standards: MARC
History:• Originally devised by the Library of Congress, 1966: MARC 1
• Format designed with magnetic tape in mind!
• 1967/8 expanded through collaboration with British Library
• Led to two broad versions: UK … subfields …
• Many international variations: tend to follow US MARC orUK MARC
• Used as an exchange format or a communication format
USMARC DANMARCCAN/MARC UNIMARC FINMARC UKMARC CHINA-MARC
MARC21
General DefinitionGeneral Definition
Metadata in its broadest sense is Metadata in its broadest sense is data about datadata about dataDocumentation about documents and objectsDocumentation about documents and objectsDescribing (Tagging) the contents of the objectDescribing (Tagging) the contents of the objectFor Information Discovery from the Resource BaseFor Information Discovery from the Resource Base
Internet context Internet context
Data Data describing the attributes of an electronic resourcedescribing the attributes of an electronic resource on on the netthe netDublin Core (DCMI)Dublin Core (DCMI) –– WWW Consortium StandardWWW Consortium StandardXML XML -- The toolThe tool
MetadataMetadata
Dublin Core Metadata Elements
Responsibility
Manifestation
Title The name given to the resource by the creator or publisher Creator The person responsible for the intellectual content of the
resource Subject The Topic of the resourceDescription A textual description of the content of the source Publisher The Entity responsible for making the resource available Contributor A person or organization (other than the Creator) who is
responsible for making significant contributions to the intellectual content of the resource
Date A date associated with the creation or availability of the resource
Type The nature or genre of the content of the resource Format The physical or digital manifestation of the resource Identifier An unambiguous reference that uniquely identifies the
resource within a given context Source A reference to a second resource from which the present
resource is derived Language The language of the intellectual content of the resource Relation A reference to a related resource, and the nature of its
relationship Coverage Spatial locations and temporal durations characteristic of
the content of the resourceRights Information about rights held in the resource
The Basics:22 Elements
Metadata Definition
Content
DL DL -- HardshipsHardships
Copyright IssuesCopyright IssuesTechnology ComplexitiesTechnology ComplexitiesInfrastructure IssuesInfrastructure IssuesPublications/Formats Publications/Formats –– Diverse Diverse DatastreamsDatastreamsDigital Objects/Formats Digital Objects/Formats -- Multiple Multiple PublishersPublishers’’ Policies Policies –– Stringent, InconsistentStringent, Inconsistent
Major TasksMajor TasksContent identification (internal / external)Content identification (internal / external)Content CreationContent CreationContent Collation/SignpostsContent Collation/SignpostsOrganisationOrganisationUpdationUpdationRetrieval / Dissemination Retrieval / Dissemination User TrainingUser TrainingArchivingArchiving
Data/Objects
METS/MODS
EAD TEI
DCMI
OS
Z39.50 /OAI-PMH
Network
DL Software
DIGITAL LIBRARY ARCHITECTURE