Tools for Next Generation of CMS: XML, RDF, & GRDDL

31
Tools for Next Generation of CMS: XML, RDF, & GRDDL Chimezie Ogbuji (chee-meh) Cleveland Clinic Foundation Cardiothoracic Surgery Research [email protected] / [email protected]

description

Tools for Next Generation of CMS: XML, RDF, & GRDDL. Chimezie Ogbuji (chee-meh) ‏ Cleveland Clinic Foundation Cardiothoracic Surgery Research [email protected] / [email protected]. Background (CT Research Roadmap) ‏. A large, relational registry for Cardiothoracic procedures - PowerPoint PPT Presentation

Transcript of Tools for Next Generation of CMS: XML, RDF, & GRDDL

Page 1: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Tools for Next Generation of CMS: XML, RDF, & GRDDL

Chimezie Ogbuji (chee-meh)Cleveland Clinic FoundationCardiothoracic Surgery [email protected] / [email protected]

Page 2: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Background (CT Research Roadmap)

A large, relational registry for Cardiothoracic procedures

Relatively small research department with very little software engineering experience

Traditional CMS and DBMS were insufficient Initiated a large effort to convert to a metadata-

driven XML / RDF repository (SemanticDB) Need to replace a productive, integrated research

pipeline Data entry, clinical Q&A, patient follow-up, concurrent

study management,... 100+ research papers per year

Page 3: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Background (Institute of Medicine Proposal)

The Computer-Based Patient Record: An Essential Technology for Health Care ISBN: 0309055326

Old but very relevant set of requirements by the IOM (still unfulfilled).

A comprehensive attempt to address all the requirements: technological, clinical, procedural, etc..

Can be (completely) addressed with Semantic Web architecture, document processing, and “Web 2.0” architecture.

Page 4: Tools for Next Generation of CMS: XML, RDF, & GRDDL

CPR: Functional Requirements

Uniform, extensible record content (Standard) record formats System performance Linkages Intelligence Reporting Capabilities Security Multi-views Accessiblity

Page 5: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Definitions: KR / CMS

What is Knowledge Representation (KR)? What is a Knowledge Base (KB)?:

A database system which facilitates deductive reasoning over a KR

Commonly called Rule-based Systems What are Expert Systems? What is a Content Management System

(CMS)?

Page 6: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Knowledge Representation

Older ideas at corners, newer ideas along sides (Credit: Conrad Barski, M.D.)

Page 7: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Content Management System:The What

The terms CMS and Content Repository are essentially interchangeable

Modern content repositories are best characterized by JSR 170 / 283

“.. a high-level information management system that is a superset of traditional data repositories”

Integrated support for the XPath data model is the most prominent feature (native document management)

Page 8: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Content Repository Feature Set

Modern CMS standards cover document management effectively Read/write access Versioning Event monitoring Document-level access control Concurrent access Cross-linking Profiles and Document Types

Page 9: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Anatomy of a JSR 170 Implementation

Jack Rabbit Component-based

Content Applications Content Repository API Implementation

Page 10: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Knowledge Bases and CMS

What of the requirements that Expert Systems meet?

Document management and knowledge management systems are historically isolated from each other

XML & RDF are contemporary manifestations of these methodologies

They have remained as isolated as their predecessors

They typically only coincide with regards to syntax

Page 11: Tools for Next Generation of CMS: XML, RDF, & GRDDL

XML & RDF:Eating and Having your Cake

Classic example of where the document-oriented approach falls short: Modern EHR cannot facilitate dynamic research

Unified infrastructure for document and knowledge management is needed

One of the earliest examples: 4Suite Server version 0.10.0 (December 2000)

Current state of the art (GRDDL): Gleaning Resource Descriptions from Dialects of Language

Page 12: Tools for Next Generation of CMS: XML, RDF, & GRDDL

GRDDL:The Elevator Pitch

Provides a way to normalize RDF concrete syntaxes

The problem: Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..) The authoritative concrete syntax is not without issues

The solution: Define mappings from XML dialects to RDF graphs Use turing-complete XML pipelines

English as a second language analogy

Page 13: Tools for Next Generation of CMS: XML, RDF, & GRDDL

The GRDDL Picture

Page 14: Tools for Next Generation of CMS: XML, RDF, & GRDDL

GRDDL:The Components

Faithful Rendition “By specifying a GRDDL transformation, the author of a

document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”

Various Mechanism for nominating transformations: Specific XML attribute, XML Namespaces, HTML

Profiles, and XHTML links GRDDL-aware agents compute GRDDL results

(RDF graphs)

Page 15: Tools for Next Generation of CMS: XML, RDF, & GRDDL

The CMS Alternative:“Dual Representation”

Persist XML in synchrony with its faithful rendition Changes to the XML trigger calculation and storage of

corresponding RDF “Dual Representation” Implemented by 4Suite Server Document

Definitions The basis of how we capture patient records with

maximum syntactic and semantic expressivity

Page 16: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Document Definition

The document definition is the mapping Usually an XSLT document

Page 17: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Content Repository Architecture

Page 18: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Overlap between Content Repository APIs

Page 19: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Dual Representation:Advantages

Maximum expressiveness and versatility of content Unified naming convention and access control

(more on this later) Uniform, concrete RDF syntaxes

For systems which speak XML fluently (XForms, POX over HTTP, WS-*, etc..)

Cheap support for XML & RDF content negotiation Use of RDF as a semantic index for XML

Page 20: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Document Definition:Similarities

GRDDL RDDL

Resource Directory Description Language Human-readable descriptive material about a target A directory of individual resources related to a target

Nature and Purpose Schema, stylesheet, etc.

Lives at a namespace URI WXS's targetNamespace Common theme is a set of definitions for a

document or a class of documents

Page 21: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Registering a Document to a Class

Namespace registration works well for the web (preferred approach of W3C TAG)

What if you don't control the content served from the namespace of an existing vocabulary? Atom, Docbook, etc.

A CMS is better suited for a 'closed' / 'controlled' approach Persist membership metadata in the CMS

Page 22: Tools for Next Generation of CMS: XML, RDF, & GRDDL

SemanticDB and Dual Representation

Page 23: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Document and Graph Granularity

Tying documents to graphs normalizes the content granularity

Documents and their RDF graphs can be treated uniformly: Naming convention Targeted querying Access control management

Page 24: Tools for Next Generation of CMS: XML, RDF, & GRDDL

JSR Fine-Grained Control

Page 25: Tools for Next Generation of CMS: XML, RDF, & GRDDL

'Controlled' Naming Convention

Page 26: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Controlled Naming Convention:Continued

RDF Dataset (from SPARQL): A collection of named graphs

The RDF is stored in a graph with the same URI as the XML source document

When RDF is used as the primary cross-document 'index' you can:

SELECT ?graph WHERE { GRAPH ?graph { ... } } document($graph)/.. XPath ..

The space compromise (of dual representation) can be further mitigated by only extracting a minimal RDF graph

Page 27: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Uniform Access Control for XML/RDF CMS

Traditionally, Access Control Lists are associated with an object Example: a file or directory in a filesystem

Assign document / graph ACLs to a single URI Certain users / groups can query the RDF but cannot

read the XML De-identification of EHR: HIPPA

The 4Suite repository supports unified XML/RDF ACL

Page 28: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Going Forward

The SPARQL RDF dataset needs to be generalized There is a long list of representation problems solved by

a formal named graph specification RDF graphs need to be first-class objects in CMS Build a common Content Repository API for XML /

RDF on the JSR 170 / 283 foundation Where do the 4Suite Repository API and JSR 170 /

283 overlap? How do we generalize Document Definitions?

Page 29: Tools for Next Generation of CMS: XML, RDF, & GRDDL

A Proposal for XML/RDF CMS

Page 30: Tools for Next Generation of CMS: XML, RDF, & GRDDL

Primary Takeaways

We need to stop thinking of XML & RDF as mutually exclusive solutions to similar problems

CMS standards are needed for the next generation of semantic / rich web applications

These standards can preemptively level the landscape of toolkits in this space

Page 31: Tools for Next Generation of CMS: XML, RDF, & GRDDL

References

D. Nuescheler et al, JSR 170: Content Repository for Java http://jcp.org/en/jsr/detail?id=170

D. Connolly, Gleaning Resource Descriptions from Dialects of Language http://www.w3.org/TR/grddl/

J. Borden, T. Bray, Resource Directory Description Language http://www.rddl.org/

E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/

Fourthought Inc., 4Suite http://4Suite.org