GRDDL: The Why, What, How, and Where
-
Upload
chimezie-ogbuji -
Category
Technology
-
view
1.686 -
download
1
Transcript of GRDDL: The Why, What, How, and Where
![Page 1: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/1.jpg)
GRDDLGRDDLThe Why, What, How, and Where
Chimezie OgbujiCleveland Clinic Foundation
![Page 2: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/2.jpg)
GRDDL: The AcronymGRDDL: The Acronym Gleaning Resource Descriptions (from) Dialects (of) Language
Rather long and intimidating
![Page 3: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/3.jpg)
GRDDL: By DeconstructionGRDDL: By Deconstruction
Wordnet Definition of Glean:◦ (gather, as of natural products)◦ Synonyms: reap, harvest.
Resource Description Framework (RDF)◦ Logical assertions
Dialects of Language ◦ XML document families (XHTML, for instance)
![Page 4: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/4.jpg)
GRDDL: By AnalogyGRDDL: By AnalogyGRDDL can be thought of as a protocol for sowing semantics in web content for later harvest.
![Page 5: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/5.jpg)
The WhyThe Why Vast amount of latent semantics in markup
Web content today is primarily built for human consumption
Text indexing will only get you so far for document retrieval
If machines are meant to harvest RDF from documents, reproducible protocols are needed
<span>Chimezie Ogbuji<span>
![Page 6: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/6.jpg)
The Why (Cont.)The Why (Cont.) Microformats, eRDF, and RDFa
Specific to a particular family of documents
XHTML and HTML If the goal is machine consumption, the
bar needs to be raised beyond XHTML
![Page 7: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/7.jpg)
The Why (Cont.)The Why (Cont.) It seems easy to forget that XHTML is
indeed an XML dialect You would think the (X) would make
that obvious What was needed was a standard way to
harvest RDF that is applicable to all XML dialects
![Page 8: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/8.jpg)
The WhatThe What Faithful rendition Transformations GRDDL result Source documents GRDDL-aware Agents
![Page 9: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/9.jpg)
Faithful RenditionFaithful Rendition“By specifying a GRDDL transformation, the author of a document
states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”
Licenses an author-certified interpretation of an XML document
A powerful paradigm for messaging See David Booths “RDF and SOA” http://www.w3.org/2007/01/wos-papers/booth
![Page 10: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/10.jpg)
GRDDL TransformationsGRDDL Transformations Functions that take an XML document and
return an RDF graph Transformations can be written in any
particular language The “reference” transformation language is
XSLT “[XSLT1] is the format most widely supported by GRDDL-
aware agents as of this writing […] is specifically designed to express XML to XML transformations and has some good
safety characteristics”
![Page 11: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/11.jpg)
Other Transformation LanguagesOther Transformation Languages “.. technically Javascript, C, or virtually any
other programming language may be used to express transformations for GRDDL”
However, these transformations need to be deterministic in order to ensure the result is a faithful rendition
Hence, they must be functions
![Page 12: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/12.jpg)
GRDDL ResultGRDDL Result The result of applying the transformation is
an RDF serialization The RDF graph that corresponds to the
serialization is a GRDDL result of the original document
The “reference” result format is RDF/XML Other formats can be used (Turtle, N3,etc.)
![Page 13: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/13.jpg)
GRDDL Source DocumentsGRDDL Source Documents The class of documents for which GRDDL
defines a way to extract a result graph: XML Documents XML Namespace Documents Valid XHTML XHTML Profiles
![Page 14: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/14.jpg)
GRDDL Source DocumentsGRDDL Source Documents
![Page 15: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/15.jpg)
GRDDL: XML DocumentsGRDDL: XML Documents GRDDL Namespace (grddl prefix)
http://www.w3.org/2003/g/data-view#
transformation attribute<?xml version=“1.0” encoding=“UTF-8”?>
<root
xmlns:grddl='http://www.w3.org/2003/g/data-view#’
grddl:transformation=“.. path to transform ..”>
… XML content ..
</root>
![Page 16: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/16.jpg)
Namespace DocumentsNamespace Documents“Transformations can be associated not only with individual
documents but also with whole dialects that share an XML namespace”
A GRDDL source document lives at the location of the namespace URI of the root element (the namespace document)
The GRDDL result of the namespace document has a statement of the form:
?nsDoc grddl:namespaceTransformation ?txDoc
• txDoc is the location of a transformation applicable to such XML documents
![Page 17: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/17.jpg)
Valid XHTML DocumentsValid XHTML Documents<html xmlns="http://www.w3.org/1999/xhtml">
<head
profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title>
<link rel="transformation"
href=”.. path to transformation .. " />
...
</head>
…
</html>
Refers to the GRDDL XHTML profile Licenses the interpretation of
rel=“transformation” links
![Page 18: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/18.jpg)
XHTML ProfilesXHTML Profiles“Adding a GRDDL profileTransformation assertion to a profile
document is much like adding a namespaceTransformation assertion to a namespace document”
A GRDDL source document lives at the location of the profile URI an XHTML document
The GRDDL result of the profile document has a statement of the form:
?profileDoc grddl:profileTransformation ?txDoc
• txDoc is the location of a transformation applicable to such XML documents
![Page 19: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/19.jpg)
The HowThe How GRDDL builds on existing XML & RDF
standards An implementation mostly needs to
orchestrate: Parsing of data representations Resolving representations from web locations The necessary XML processing to peek into and
harvest RDF from the various sources The highly recursive nature of GRDDL
![Page 20: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/20.jpg)
Technological OverlapTechnological Overlap
![Page 21: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/21.jpg)
Anatomy of a GRDDL Anatomy of a GRDDL Implementation: GRDDL.pyImplementation: GRDDL.py A reference implementation from scratch 650 LOC
RDFLib, 4Suite-XML, and Python control logic
A layered approach Core module that handles transformations One module per source type stacked on top of the
core A top layer that orchestrates the recursion and
identification of which ‘class’ a source document belongs to
![Page 22: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/22.jpg)
GRDDL.py CoreGRDDL.py Core
![Page 23: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/23.jpg)
Component StackComponent Stack
![Page 24: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/24.jpg)
The WhereThe Where GRDDL services online:
http://triplr.org/ (Stuff in, triples out) http://www.w3.org/2007/08/grddl/ (W3C GRDDL
Service) Primary GRDDL implementations:
Redland GRDDL.py Virtuoso GRDDL Reader for Jena
RDFa is most common GRDDL source content format in the wild
![Page 25: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/25.jpg)
Hidden Value PropositionHidden Value Proposition Supports separation of concerns:
XML for messaging, data collection, structural validation
RDF for Expressive assertions, inference, etc.
A way to invest in data richness and accessibility
![Page 26: GRDDL: The Why, What, How, and Where](https://reader033.fdocuments.in/reader033/viewer/2022052823/5552c15cb4c90581158b47c9/html5/thumbnails/26.jpg)
GRDDL UsecasesGRDDL Usecases Embedding scheduling assertions on
personal pages Using GRDDL for extracting RDF from XML
medical record documents Cleveland Clinic use case (clinical
research) Aggregating web-based product reviews Embedding web service descriptions Adding semantic assertions to XML schemas Embedding semantic assertions to Wikis