Beyond Seamless Access: Meta-data in the Age of Content Integration Spring 2000 Program Information...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Beyond Seamless Access: Meta-data in the Age of Content Integration Spring 2000 Program Information...
Beyond Seamless Access:Meta-data in the Age of Content Integration
Spring 2000 Program Information Technology Interest Group of Association of College & Research Libraries, New England Chapter
Univ. of ConnecticutMay 26, 2000
Amanda Xu
Information ArchitectEBSCO, 10 Estes Street, Ipswich, MA 01938
OVERVIEW
•DefinitionsMeta-data, schemas, and XML linking structures
•Why content integration and analysis?Assumptions about information search and retrieval
•Meta-data applications for content integration and analysis
•How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web?
•Role of librarians, and information mediators in the wave of content
integration
Definitions (1)Meta-data, What is it? [1/6]
Definitions:1) “Data about data” or “information which describes a data set”2) Data elements, and attributes that facilitate the search and retrieval ofa set of associated attributes
Example 1:•An address label contains: name, address, city, state, zip•Address might feature a home or office, address access permissions,
last updated, internal references
3) A set of semantics that describe the data, classify it, categorize it, and provide instructions on how and where to exploit it
Example 2:•Standard bibliographic information, summaries, indexing terms, and
abstracts
Definitions (1)
Meta-data, What is it? [2/6]
Example 3: Simple XML Record
<record> <title>The Tao of Pooh</title> <author label=“personal”>Benjamin Hoff</author> <date label=“1st-published”>1982</date> <isbn>01400-67477</isbn> <publisher>Dutton</publisher> <subject label=“personal”>Winnie the Pooh</subject> <subject>Taoism in literature</subject> <classification scheme=“LCC”>PR6025.I65Z68 1983 </classification>
</record>
Definitions (1) Meta-data, What is it? [3/6]
4) Supports understanding of a document, its structure, relationship, locations, and usage
5) Helps you find things or make things disappear
Where is meta-data?
1) Internally:
• Embedded with markup, and with content
• Attached as resource header (HTML META Tag), or package
2) Externally:
• Stored separately from its resource
• Generated on demand, e.g. MS SQL Server or Oracle
• Static, e.g. bibliographic record
• Dynamic linked using Xlink/Xpointers/Xpath and ISO Hytime
Definitions (1)
Meta-data, What is it? [4/6]
Naming Issues:
Can your meta-data be interchanged, and shared with others via computer programs or parsers?
• URI = URN + URL + URC (IETF)
• Namespaces (W3C): qualify elements uniquely, and avoid name collision
• URIs specify the namespaces in use
• XML Namespaces provide a way for the name to be unique, but it doesn’t solve vocabulary ambiguity
Example 4:
<date> used in three different occasions:
From George’s document: <date>9-Sept-1999</date>From Martha’s document: <date>The lovely Deni</date>
From Hadley’s document: <date>Large Plump Medjool</date>
Use namespaces:
<george:date> 9-Sept-1999</george:date><martha:date>The lovely Deni</martha:date><hadley:date> Large Plump Medjool</hadley:date>
Note: Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston
Definitions (1) Meta-data, What is it? [5/6]
Definitions (1) Meta-data, What is it? [6/6]
Example 5: Simple Dublin Core Record with DC namespace, and qualifiers
<?xml version=“1.0” encoding=“UTF-8”?><?xml version=“1.0” standalone=“yes”?>
<record xmlns:dc=“http://purl.org/dc/elements/1.0/”xmlns:dc=“http://purl.org/dc/elements/qualifiers/1.0/”>
<dc:title>The Tao of Pooh</dc:title>
<dc:creator>Benjamin Hoff</dc:creator>
<dcq:creatorType>Illustrator</dcq:creatorType>
<dc:date>1982</dc:date> <dc:isbn>01400-67477</dc:isbn> <dc:publisher>Dutton</dc:publisher> <dc:subject>Winnie the Pooh</dc:subject> <dc:subject>Taoism in literature</dc:subject>
</record>
Definitions (2) Schemas, What is it? [1/3]
How do you know which meta-data/vocabularies that you are interchanging with?
– Schemas (DTDs): • understand document elements and structures • validation /parsing• schemas support data types (e.g. integer, time, time period), open content model, inheritance,
constraints, and namespaces
– Example: <xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
<xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed" value="US"/>
Note: Example from Brian Travis’s tutorial, “XML and Data-Driven Web Architectures”, Seybold Seminars, Boston, Feb. 11, 2000.
Definitions (2) Schemas, What is it? [2/3]
How many types of XML vocabularies are there?
Examples:
1) xml schema
<xs:schema xmlns:xs="http://www.w3.org/1999/XMLSchema targetNamespace="http://purl.org/metadata/dublin_core” version="M.n">...
</xs:schema>
2) RDF<? xml version=‘1.0’>
<rdf:RDF xmlns:rdf=“http://www.w3.org/TR/REC-rdf-syntax#”
xmlns:rdfs=“http://www.w3.org/TR/WD-rdf-schema#”
xmlns:dc=“ “>
Definitions (2) Schemas, What is it? [3/3]
3) Schema repositories: industry-specific – SOAP, BizCodes, XMLRPC, ICE, CDF, WebDav, XML/ASN.1, XML/EDI,
XER, and Z39.50
– BizTalk.org: routing information
<bizTalk>
<Route>
<From locationID=“206.247.76.187” locationType=“IP” handle=“72” process=“POConf” Path=“”/>
<To locationID=“83-627-54204” locationType=“DUNS” handle=“14” process=“PO_Process” Path=“”/>
</Route>
<body>
<purchaseOrder xmlns=“urn:schemas-toycat-com:PurchaseOrder.biz” PONumber=“10-01-2118”></purchaseOrder>
</body>
</bizTalk>
Note: Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston
Simple Meta-data Interchange Model
DB
XML/ASN.1ServerDirect Transfer
Sche
ma
& m
ap s
ys C
to S
ysA
XML/ASN.1 ServerSTMP
DB
Template m
apping
between SysA to Sys
B, then sys B to sys C
System B
ILL Request in XML/EDIFACT
Direct Transfer
•protocol•syntax•encoding
System A
System C
XML/EDIFACT to ASN.1/BER
ASN.1/BER XML/BER
Direct Transfer to STMP
ASN.1/BER to XML/BER
STMP to Direct TransferXML/BER to XML/EDIFACT
Definitions (3) Linking Structures [1/6]
My Element
Attlist my thing
Xlink-Root URIXpointer address
Remoteschema/ (DTD)
Root URI
address
Leveraging XML Syntax:
Link structures, which link an XML name tag to an external standard reference item, and which allow context query and
non-context query at element and attribute level
Notes:
Xlink specification <http://www.w3.org/TR/xlink>Xpointer Specification <http://www.w3.org/TR/xptr>
Definitions (3) Linking Structures [2/6]
ApplicationRequestlinkInfo
The API toretrieve
link information from the linkbase
Linkbase
Leveraging application:
The link structures, in which linkInfo partakes are returned to the application, which can be re-assembled for different purposes on the fly
Definitions (3) Linking Structures [3/6]
Leveraging resources and merging links
Original Doc
Link structures in which links are merged into the original doc, and formed a composite document.
API merge the links
Composite Doc
linkbase
Definitions (3) Linking Structures [4/6]
Topic Map:
“To qualify the content and/or data contained in information objects as topics to enable
navigational tools such as indexes, cross-references, citation systems, or glossaries.
To link topics together in such a way as to enable navigation between them
To filter an information set to create views adapted to specific users or purposes. For
example, such filtering can aid in the management of multilingual documents, management
of access modes depending on security criteria, delivery of partial views depending on user
profiles and/or knowledge domains, etc.
To structure unstructured information objects, or to facilitate the creation of topic-oriented
user interfaces that provide the effect of merging unstructured information bases with
structured ones.”
Note: Quote from Topic Map web site: http://www.ornl.gov/sgml/sc34/document/0058.htm/>
Definitions (3) Linking Structures [5/6]
Query
Category map
filter
profilesprofiles
knowledge domainslanguagesaccess rightsdelivery views/devices
DBDB
Structured docs
Unstructured docs
LinkCluster Adaptive categories
Attach categories
Match query
Result set w/ category map
Search/navigate
TOPIC MAPTOPIC MAP
Leverage Topic Maps
TOPIC MAP
TOPIC MAPTOPIC MAP
TOPIC MAPTOPIC MAP
TOPIC MAPTOPIC MAP
TOPIC MAPTOPIC MAP
1
2
Definitions (3) Linking Structures [6/6]
Topic association -Example<topic id=“n001” types=“city”>
<topicname><basename>New York City</basename>
</topname><mention adr1 adr2 adr3</mention></topic>
<topic id=“c98991” types=“monument”><topicname>
<basename>Brooklyn Bridge</basename></topname><mention>adr34 adr3462 adr9832</mention></topic>
<assoc type=“sightseeing” scope=“civil-engineering”><when-in>n001</when-in><visit>c98991</visit></assoc>
<topic id=“city” types=“topictypes”><topic id=“monument” types=“topictypes”><topic id=“civil-engineering”><topic id=“topictypes”>Note:Example from Steve R. Newcomb’s tutorial, “Metadata, Schemas, and Linking Structures” XML World conference, Ottawa, Sept. 13, 1999, updated 5/30/2000.
Why content integration and analysis?
Assumptions about information search and retrieval
Information retrieval is only the 1st step for information management.
The next step is information analysis and decision support, where information analysis is to cross-correlate information from multiple and diverse data sources in the net for specific problem solving, and where decision support is to detect, analyze and alert topics, trends and events based on the correlated information.
Notes:
Schatz, Bruce R. 1998. “Information Analysis in the Net: The Interspace of the Twenty-First Century.” Visualizing Subject Access for 21st Century Information Resources, edited by Pauline Cochrane and Eric E. Johnson. Univ. of Illinois at Urbana-Champaign.
Evans, David A. 1999. “Beyond Information Retrieval Workshop, 4 th Search Engine Conference,April 9, 1999, Boston, MA.”
Meta-data applications for content integration and analysis (1 of 3)
What has it to do with products for the library world?
Today:
– Full-text linking
• ILL/DocDelivery
• ILS linking for holdings
• Publishers & Authors’ Web sites
• Linking services
– Reference linking services provided by CrossRef, SFX, LANL
• Patent data
Tomorrow:
– User can link directly to any content published by a specific organization simply
by highlighting a phrase, sentence, paragraph, a document appearing in any
browser, word-processing package, email program or other application
Meta-data applications for content integration and analysis (2 of 3)
– Interwoven threads for subjects, journal titles, authors, collections
– No document boundary, but information space where a deeper understanding of knowledge within and across domain is facilitated for specific problem solving and decision support
Subjects
•UMLS•Word Net•LCSH•Lexicons•Dictionaries
Journal Titles
•Ulrich’s Serials Directory•LC Serials•Gale Directory
Authors
•Who’s Who•Wilson Bibliography•Gale Contemporary Authors•Authority files from LC•Community of Science
Linkbase
Linkbase
Article collectionsArticle collections
Book collectionsBook collections
Journal collections
Other media
Meta-data Applications for Content Integration and Analysis (3 of 3) Future -- decision support and problem solving
Meta-data standardization
Book directoryCollection directoryJournal directoryAuthor directory
Bi-directional linkingBi-directional linking
Collections
Library holdings
ILL/Document delivery
Reference linking
Site-mapKnowledge-base
Site-mapKnowledge-base
Websites
reviews/annotations
/publisher sites/author pages
/email/mailing lists
/chatting rooms/community
pages
Authority Control
How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (1 of 2)
XML is nothing but data interchange. It is the application that makes the data reusable, and thus adds functionality and intelligence to it:
In the beginning --> Editing
Generation X --> Look and feel
Intelligence (SGML/XML) --> Semantics:
Levels of fragmentation Schema recognition, Namespace handlingLinking registration and management
--> Viewing/Personalized delivery --> Interactive services, e.g. B2B --> Software applications,
e.g. re-purposing, concurrent editing
How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (2 of 2)
XML enables text mining which has become
– increasingly fine grained, subjective, and personal via
• extracting information
• counting by type (quantifying)
• categorizing/filtering
• discovering trends
• capturing critical details
• assessing trends
Note:
Evans, David A. 2000. “Text Mining Workshop.” Fifth Search Engines Conference, Boston, MA.
Role of librarians, and information mediators in the wave of content integration
Every aspect of librarian-ship is needed It is a matter of which parts you would like to
participate