1 Ontology-based Metadata Schema for Digital Library Projects in China Maosheng Lai, Xiudan Yang...
-
Upload
antonia-kelly -
Category
Documents
-
view
217 -
download
2
Transcript of 1 Ontology-based Metadata Schema for Digital Library Projects in China Maosheng Lai, Xiudan Yang...
1
Ontology-based Metadata Schema for
Digital Library Projects in China
Maosheng Lai, Xiudan Yang
Department of Information ManagementPeking University
April 28 2004
2
1 Metadata Application in Chinese Information Resources
Metadata for digital library projects Metadata Profile for Sharing Information for
Sustainable Development of China (MPSISDC)
CELTS-42 CD1.6 (Metadata Standard for Chinese Educational Information)
MICI-DC (Digital Museum Project in Taiwan)
3
Metadata Schemes for Digital Library Projects
Three major types: National projects Business operation Digital projects in university
libraries
4
Metadata Application Profiles
General Format for Digitalized Chinese Full-text by Zhongshan Library (the Public Library of Guangdong Province, partly responsible for CPDLP ) Metadata Profiles of CPDLP by Shanghai LibraryChinese Metadata Specifications by National Library of China
5
General Format for Digitalized Chinese Full-text (GFDCF)
Including format structure, element definition and content rulesBased on Dublin Core 1.1Adding element “ record ” to be the 1th elementDefining HTML tag and cataloging “label” for each elementRecords are issued on the Web
6
Cataloging obeys Chinese Books Cataloging RulesDevelop a full-text retrieval systemEmploy object-oriented technology to do cataloging on text, picture, audio-video, computer program and other types of resourcesBe under the guide of International Standard Bibliographic Description (ISBD), but taking some revisions with indexing and digitalizationCatalog elements are mapped into elements in other metadata schemes
7
RecordTitle
CreatorSubject, Classification or Keyword
DescriptionPublisher
ContributorDateType
FormatIdentifierSource
LanguageRelationCoverage
Rights
Figure 1: metadata elements in GFDCF
Record
8
CPDLP Metadata Profiles
Based on Dublin Core/RDFDivided into two layers:Dublin Core(low), MARC or TEI Header(high)Including semantics, content rules, syntax structure and specifications of element qualifiersOpen to multi-metadata schemes for the purpose of describing different types of resources, for example, CNMARC is mapped here to Dublin Core and is encapsulated by XML/RDF
9
Chinese Metadata Specifications(CMS)
Issued in March 2002Adopts Open Archival Information System (OAIS) as reference modelElement sets are selected based on mature metadata from LC, NLA, Cedars, DC, NEDLIB, etcCMS keeps a mapping relation with DC within a consistent metadata frameworkCore element set is mandatory and frequently used within the whole specificationContains most elements of Dublin Core, adding several elements to DC, such as digital rights, preservation and usage of digital resources
10
Provides basic extended element set(optional)CMS Qualifiers identify the encoding scheme and refine the meaning of elements, which is similar to DC qualifiersRequires that application profiles should be constituted according to the actual needs and specific resourcesFramework structure comprises of core element set, extended element set, semantics and content rules, XML DTD and RDF Schema
11
Figure 2: Chinese core metadata framework structure
Available at: http://www.cdi.cn/download/dmds.pdf
Information Package
Preservation description information
Content information
Reference information
Context information
Provenance information
Inherent information
Description information
Resource description
Rights management
Management history
Original history
Identifiers
Structure information
Format description
Rights information
Processing history
Preservation history
Original technical environment
12
CMS Core Element Sets
Includes seven sets : Resource Description Relative Information Objects Rights Management Original History Management History Inherent Information Abstract Format Description
13
CMS Elements
Resource Description : Title, Subject, Edition,Abstract, Content type, Language, Coverage, Creator, Contributor, Date of Creation, Publisher, Copyrights holder, Identifier
Relative Information Objects: Related Objects Rights Management: Digital Publisher Name, Digital
Publisher Date, Digital Publisher Place, Rights Warning, Actors, Actions
Original History: Original Technical Environments Management History: Ingest Process History,
Administration History Inherent Information: Authentication Indicator Abstract Format Description: UAF-Description
14
Metadata Schemes in Business Companies
21dmediaEmbedded in warehouse platform
Elements are selected and defined for database
itself and not based on any existed schemes
recorded types of resource are print materials,
such as books, journals, and newspapers
building a mapping structure to CNMARC
15
Unihanprovides a tool, E-Cataloger, which could syncopate, identify and converse Chinese wordscore element set is based on Dublin Coreelements are represented in XMLrecords are mapped to CNMARC, and embedded in digital objectsemploys Unicode CJK to deal with multi-language issues, and the product is Japanese-Korean catalog created by E-Cataloger
16
Metadata Application Profiles in University Libraries
Peking University Rare Book Digital Library(RBDL)
17
Core Elements ( 12 )
Local Core Elements ( 2 )
Unique Elements ( 1)
DC Elements
资源形式 Format
题名 Title
主要责任者 Creator
其他责任者 Contributor
出版项 Date, Publisher
版本( Edition )
外 观 形 态 ( Physical Description )
附注说明 Description
收藏历史( Collection
History )
相关文献 Relation
主题词 Subject and Keywords
语种 Language
时空范围 Coverage
古籍标识 Resource Identifier
馆藏信息 Rights Management
RBDL Metadata Elements
18
RBDL metadata schema
RBDL metadata schema has three types of metadata: descriptive metadata, administrative metadata, and GIS metadata.
The element set is divided into three parts: core elements (a generalized component for all kinds of the objects, in accordance with DC), local core elements (a common part for local collections) and unique elements (designed for specific type of objects)
The most important function of metadata profile in RBDL is the metadata standard framework, which could guideline designing metadata schema for specific area
19
Chinese metadata standard framework in RBDL (PKUL)
20
2 Problem Analyses
☆Lack of unified semantics, content rules for Chinese information resources
☆Different core elements ☆Lack of mapping and interoperability ☆Diversification of Chinese metadata system ☆Different semantics and records ☆Different thesaurus
21
3 Ontology-based metadata schema for Digital Library Projects in China
Ontology of Chinese information resources Ontology of bibliographic relations Ontology-based digital library metadata
schema
22
Ontology of Chinese information resources
23
Ontology of bibliographic relations
FRBR Entities Work Expression Manifestation Item
24
Expression
Manifestation
Item
Work
Physical -recording ofcontent
Intellectual/artistic content
is realized through
is embodied in
is exemplified by
25
“Book”–Song Shu
(item)
–“publication” at bookstore
(manifestation)
26
“ Book”–Who translated?
(expression)
–Who wrote?
(work)
27
FRBR Entity Levels
Work
Expression
Manifestation
The Novel
Orig.Text
Transl. CriticalEdition
The Movie
Orig.Version
Paper PDF HTML
Item Copy 1Copy 2
28
Possible FRBR applications
Authority
Bibliographic
Holding Item
Work/Expression
UniformTitle Concept
Manifestation
Person
Series (work/expression)
UniformTitle
29
ExpressionWork
Manifestation
Item
30
31
CONCEPTION Different from FRBR’s Work, it may be a
concept, plan, design for work, or the abstracted from any particular format.
A conception has a uniform-title, a uniform-name (for an author), other-specific-characteristic, description, keyword(s), topic(s), date, audience, and conceptual-level etc.
32
EXPRESSION
A conception with specified content
An expression also has a title, other-specific-characteristic, date, language, summary, context, critical-response, roles for various rights-owners (e.g. author), use-restrictions, and size.
33
MANIFESTATION
The physical embodiment of an expression of a conception
As it is the physical form, manifestation thus includes manuscripts, books, periodicals, maps, art works, paintings, posters, sound recordings, films, video recordings, CD-ROMs, DVDs, multimedia games, digitalized versions in PDF, HTML, web sites, and so on.
34
Manifestations also have a title (inherited from the expression, but may be a variant), a name for an author, a unique identifier, edition/issue, place-of-publication, serials, provider(s), roles for various rights-owners (e.g. publisher), terms-of-availability, contact, coverage, and update-frequency.
35
DIGITALIZATION
A manifestation encoded in a digital-format in digital libraries
A digitalization has a unique identifier, date, and provenance.
36
INSTANCE
Particular copy of a digitalization The entity defined as instance is a
concrete entity. Instance has a unique identifier, date,
address, access-mechanism, access-restriction, exhibition-history, condition, and treatment-history.
Instances could form into collection, which is the major descriptive and management objects of today’s metadata schemes.
37
Ontology-based Chinese digital library
metadata schema
Ontology of information resource
Ontology of bibliographic relations
MARC Metadata schema
Classifications Thesaurus
metadata
38
NameDescriptionKeywordProvider Rights-ownerContactTopicAudienceConceptual-level LanguageCoverageUpdate-frequencyAccess-mechanism(s)
…..
39
Thank You Thank You
!!