RDF, XML and interoperability Managing networks : understanding new technologies, Birmingham, 13...
-
Upload
ethel-shields -
Category
Documents
-
view
214 -
download
0
Transcript of RDF, XML and interoperability Managing networks : understanding new technologies, Birmingham, 13...
RDF, XML and interoperability
Managing networks : understanding new technologies, Birmingham,
13 September 2001
Pete Johnston
UKOLN, University of Bath
Bath, BA2 7AY
UKOLN is supported by:
[email protected]://www.ukoln.ac.uk/
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
2
RDF, XML & interoperability
• Metadata : a reprise• Communities, communication & XML• An introduction to RDF• RDF, XML and interoperability
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
3
What is metadata?
• “Data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person.”
– Dempsey and Heery, 1998
• “Machine understandable information about web resources or other things.”
– Berners-Lee, 1997
• Structured data about resources that can be used to help support a wide range of operations
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
4
What resources, objects, things?
• HTML documents• digital images• databases• books• museum objects• archival records• metadata records
• collections• services• physical places• people• abstract “works”• concepts• events
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
5
What operations?
• User wants to– find, identify, select, obtain / use
• Owner / manager / provider wants to– describe – enable and control access/use– administer
• Different “flavours” of metadata serve different purposes
– Simple, generic vs. rich, specific
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
6
Communities & communication
• Effective transmission of information requires agreement on
– semantics– what terms mean– e.g. “cat”, “to sit”, “mat”
– structure– significance of arrangement of terms– e.g. sentence: subject -> verb -> object
(in English….)
– syntax– rules of expression– “The cat sat on the mat.”
• A resource description community is defined by consensus on conventions
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
7
Communication using XML (1)
• An example– I prepare a music catalogue using the (imaginary!)
AlbumCat XML schema – I publish my XML document on the Web– someone else prepares a catalogue using the
same XML schema and publishes their XML document
• I can read their XML document and locate tracks created by Don Van Vliet in their catalogue
• But more importantly…..
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
8
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
9
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
10
Communication using XML (2)
User request: Find identifiers of all tracks with creator “Don Van Vliet”
Program action:Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet”
… my software can search their document because I have programmed it to map:
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
11
Communication using XML (3)Program action:Find
values of dc:identifier attributes
of track elements
which have a dc:creator child element
with content “Don Van Vliet”
<catalogue>
<album dc:identifier="http://pj.org/album/245">
<dc:title>The Spotlight Kid</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
<track dc:identifier="http://pj.org/track/723">
<dc:title>Grow fins</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
</track>
</album>
</catalogue>
Program action:Find
values of dc:identifier attributes
of track elements
which have a dc:creator child element
with content “Don Van Vliet”
<catalogue>
<album dc:identifier="http://pj.org/album/245">
<dc:title>The Spotlight Kid</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
<track dc:identifier="http://pj.org/track/723">
<dc:title>Grow fins</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
</track>
</album>
</catalogue>
Program action:Find
values of dc:identifier attributes
of track elements
which have a dc:creator child element
with content “Don Van Vliet”
<catalogue>
<album dc:identifier="http://pj.org/album/245">
<dc:title>The Spotlight Kid</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
<track dc:identifier="http://pj.org/track/723">
<dc:title>Grow fins</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
</track>
</album>
</catalogue>
Program action:Find
values of dc:identifier attributes
of track elements
which have a dc:creator child element
with content “Don Van Vliet”
<catalogue>
<album dc:identifier="http://pj.org/album/245">
<dc:title>The Spotlight Kid</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
<track dc:identifier="http://pj.org/track/723">
<dc:title>Grow fins</dc:title>
<dc:creator>Van Vliet, Don</dc:creator>
</track>
</album>
</catalogue>
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
12
Metadata use
• Resource users wish to – search across the boundaries of communities– combine resources from different communities
• Resource providers wish to – exchange descriptions with members of other
communities
• Third parties wish to– describe resources owned/described by others
• Metadata is – used beyond its creator community– combined with metadata from other communities
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
13
Communication using XML (4)
• Continuing the example– a museum describes their holdings using the
(imaginary...) ArtCat XML schema and publishes their XML document
• I can read their XML document and locate pictures created by Don Van Vliet listed in their catalogue
– requires my guesswork and/or reference to semantics of ArtCat schema
• But….
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
14
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
15
Communication using XML (5)
User request: Find identifiers of all “works” with creator “Don Van Vliet”
Program action (AlbumCat):Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet”
… to search across both catalogues, my software now has to be programmed with two mappings:
Program action (ArtCat):Find content of dc:identifier elements which have a picture parent element with a details child element which has a dc:creator child element with content “Don Van Vliet”
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
16
The problem
• Statement– this resource (track, picture... etc!) has dc:creator
“Don Van Vliet”
• Multiple expressions in XML– different XML schemas make different choices– all “good” (and valid)– human reader of document can interpret (maybe)– program needs prior “knowledge” of structural
conventions in each XML schema
• Not scalable in an “open” environment– how to manage ever increasing set of conventions– always encountering unknown schemas
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
17
The problem (2)
“XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean.”
– Berners-Lee, 2001
• Consensus on syntax– use of XML
• Consensus on semantics of terms– meaning of (uniquely named through XML
namespace) elements/attributes
• No consensus on meaning of structure– e.g. parent-child element relations
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
18
Introducing RDF
• Resource Description Framework Model & Syntax
• Recommendation of W3C, 1999• Generic “architecture” for metadata
– set of conventions for applications exchanging metadata
– allow semantics to be defined by different resource description communities
– accommodate mixing of metadata from diverse sources
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
19
Introducing RDF (2)
• Defines – model for making statements about resources– conventions for encoding statements using XML
syntax
• Object types– Resource : any object identified by URI
– not necessarily accessible via Web– Property : “attribute” to describe resource
– properties also uniquely identified by URI– Statement : “triple” of specific resource, named
property, and value
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
20
The RDF model
http://pj.org/doc/1author
Pete
A resource has some property whose value is either (i) a simple string value (literal)…
– The resource identified by the URI http://pj.org/doc/1 has a property “author” whose value is “Pete”
– Or, “Pete” is the “author” of the resource identified by http://pj.org/doc/1
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
21
The RDF model (2)
… or (ii) another resource...
http://pj.org/doc/1author
Pete [email protected]
name email
– The value of property “author” is another resource which has a property “name” with value “Pete” and a property “email” with value “[email protected]”
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
22
The RDF model (3)
… which may itself have a URI
http://pj.org/doc/1
author
Pete
http://pj.org/person/pete
name email
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
23
The power of RDF
• Extensible model– supports any vocabularies
• Supports arbitrary complexity of description• URIs as unique fixed points to identify
– resources– properties
• Descriptions created independently can be “merged” using URIs as “anchors”
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
24
First source
http://pj.org/doc/1
author
Pete
http://pj.org/person/pete
name email
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
25
Second source
http://pj.org/doc/1subject
XML
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
26
Third source
http://pj.org/person/pete
organisation
UKOLN
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
27
Three descriptions merged
http://pj.org/doc/1
author
Pete
http://pj.org/person/pete
name email
http://pj.org/doc/1
subject
XML
http://pj.org/person/pete
organisation
UKOLN
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
28
The RDF XML syntax
• XML representation of model– to store/exchange descriptions
• Property names made unique through use of XML namespaces
• Variant XML syntaxes for RDF
<rdf:RDF xmlns:uc=“http://www.ukoln.ac.uk/core/”> <rdf:Description about=”http://pj.org/doc/1”> <uc:author>Pete</uc:author> </rdf:Description></rdf:RDF>
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
29
The RDF XML syntax (2)
• Using RDF/XML syntax means accepting conventions for the meaning of structures in XML document
• So, an RDF/XML processor can “know in advance” the meaning of structures
– even if the description uses unanticipated vocabularies
– “partial understanding”
• Can read multiple descriptions into store and “merge” on URIs
• Will be generated/consumed by software!
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
30
First source
http://pj.org/doc/1
author
Pete
http://pj.org/person/pete
nameemail
<rdf:RDF xmlns:uc=“http://www.ukoln.ac.uk/core/”> <rdf:Description about=“http://pj.org/doc/1”> <uc:author> <rdf:Description about=“http://pj.org/person/pete”> <uc:name>Pete</uc:name> <uc:email>[email protected]</uc:email> </rdf:Description </uc:author> </rdf:Description></rdf:RDF>
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
31
Second source
http://pj.org/doc/1subject
XML
<rdf:RDF xmlns:uc=“http://www.ukoln.ac.uk/core/”> <rdf:Description about=”http://pj.org/doc/1”> <uc:subject>XML</uc:author> </rdf:Description></rdf:RDF>
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
32
Third source
http://pj.org/person/pete
organisation
UKOLN
<rdf:RDF xmlns:uc=“http://www.ukoln.ac.uk/core/”> <rdf:Description about=”http://pj.org/person/pete”> <uc:organisation>UKOLN</uc:organisation> </rdf:Description></rdf:RDF>
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
33
Three descriptions merged
<rdf:RDF xmlns:uc=“http://www.ukoln.ac.uk/core/”> <rdf:Description about=“http://pj.org/doc/1”> <uc:author> <rdf:Description about=“http://pj.org/person/pete”> <uc:name>Pete</uc:name> <uc:email>[email protected]</uc:email> <uc:organisation>UKOLN</uc:organisation> </rdf:Description </uc:author> <uc:subject>XML</uc:subject> </rdf:Description></rdf:RDF>
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
34
A Dublin Core description
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description about="http://www.ukoln.ac.uk/">
<dc:title>UKOLN home page</dc:title>
<dc:creator>Web-support Team, UKOLN</dc:creator>
<dc:subject>digital information management; metadata</dc:subject>
<dc:description>The home page of the UKOLN web site. UKOLN is a
national focus of expertise in digital information management. It
provides policy, research and awareness services to the UK library,
information and cultural heritage communities. UKOLN is based at the
University of Bath.</dc:description>
<dc:publisher>UKOLN</dc:publisher>
<dc:date>2001-09-06</dc:date>
<dc:type>Text</dc:type>
<dc:format>text/html</dc:format>
<dc:format>12809 bytes</dc:format>
</rdf:Description>
</rdf:RDF>
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
35
RDF, XML & interoperability
• Why isn’t XML enough?– simple statement could be expressed in XML in
many different ways– human reader makes interpretation/guess– application program requires prior knowledge of
schema/DTD design
• RDF/XML– imposes extra syntactic constraints on how
statement expressed– both human and program can interpret description
consistently
• Less flexibility, greater interoperability
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
36
RDF, XML & interoperability
• Tentatively….• Use XML for exchange when
– partners (humans, applications) both “know” semantics conveyed by structure of (meta)data
• Use RDF/XML for exchange when– (meta)data potentially used by applications without
prior “knowledge” of specific schema– (meta)data incorporates overlapping structures
from different domains
• N.B. raises issues of trust– who made statements?
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
37
A note of caution
• RDF not (yet?) a widely adopted technology• Addresses cross- organisation/domain problems • Some scepticism?
– perceived as theoretical, “academic”?– also considerable enthusiasm!
• Some revisions to Model & Syntax in progress at W3C
– XML 1.0 is stable– RDF less so
• Limited tools available (at present!)• But also growing number of applications
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
38
Exercise (optional)
• DC-dot– http://www.ukoln.ac.uk/metadata/dcdot/– Web-based tool– generates DC metadata for Web pages, based on
existing <meta> tags, heading content etc
• Experiment with DC-dot to generate DC metadata for pages of your choice
• View the RDF/XML representations
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
39
Acknowledgements
UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.
http://www.ukoln.ac.uk/