CCEL & XML
description
Transcript of CCEL & XML
CCEL & XML
A Match Made In(Ethereal) Heaven
The CCEL
• The Christian Classics Ethereal Library Online library of electronic texts
The CCEL
• The Christian Classics Ethereal Library Online library of electronic texts Different applications require different
formats
The CCEL
• The Christian Classics Ethereal Library Online library of electronic texts Different applications require different
formats Which format is authoritative?
The CCEL
• The Christian Classics Ethereal Library Online library of electronic texts Different applications require different
formats Which format is authoritative?
• ThML (Theological Markup Language) XML application Master version of most (someday all)
texts on ccel.org
ThML
• Defined as an XML application several years ago
• Perl scripts and other “hacks” used to transform ThML into HTML (and other formats) for presentation on site Static transformation Error prone Difficult to maintain
ThML
• Defined as an XML application several years ago
• Perl scripts and other “hacks” used to transform ThML into HTML (and other formats) for presentation on site Static transformation Error prone Difficult to maintain
• Isn’t there a better way?
XSLT
• XML Stylesheet Language Transformations Stylesheets written in XML that
“explain” to a piece of software called a “transformer” how to change an XML document into a different format (XML or other)
Example: Client-side transformations with Internet Explorer 6
XSLT on the CCEL
• Summer of 2001 Two students (myself and Jimmy Osborn) wrote
XSLT stylesheets that duplicated the functionality of the Perl script transformations From user’s perspective, the XSLT-transformed docs
look no different from the Perl-transformed docs From developer’s perspective, there is now only one
file for every text in the library (the ThML file), all other formats are generated on-the-fly at the user’s request
XSLT on the CCEL
• Summer of 2001 Two students (myself and Jimmy Osborn) wrote XSLT
stylesheets that duplicated the functionality of the Perl script transformations From user’s perspective, the XSLT-transformed docs look no
different from the Perl-transformed docs From developer’s perspective, there is now only one file for
every text in the library (the ThML file), all other formats are generated on-the-fly at the user’s request
• How does this work? Client-side transformations are not powerful
enough for our purposes We needed a more powerful, server-side
transformer
• Apache Cocoon Java webapp (webapps are servlets with an
attitude) Runs inside a Java servlet engine (e.g. Apache
Tomcat), which is then connected to a web server Not just a transformer, but an “XML Publishing
Framework” Brings together of host of different XML technologies for
the purpose of publishing XML docs online Does XSLT transformations, XSL-FO to PDF conversion,
renders SVG graphics, executes eXtensible Server Pages (XSP), integrates with databases, washes your dog and more
• Sitemap The sitemap is where Cocoon really shows that
it is more than just a kludge of XML-processing programs
Using regexp matching (or simple wildcard, if you’re into that sort of thing…), it maps URLs to an XML document, a stylesheet, and a transformer (and a few other things I won’t get into here) Based on the URL, you can match any XML source doc
with any XSLT stylesheet and then send the result through any transformer you like!
CCEL Sitemap
Select XML Source/[authorID]/[bookID].xml
Select XSLT Stylesheetthml.[format].xsl
Select Output TransformerHTML/PDF/OEB/TXT/etc.
Resulting Document
URL from User Agent (Web Browser)/ccel/[authorID]/[bookID].[format]
CCEL Sitemap
• New Problem: People browsing online don’t want the whole text in one big HTML file Cocoon’s sitemap helps us here too…
CCEL Sitemap
• New Problem: People browsing online don’t want the whole text in one big HTML file Cocoon’s sitemap helps us here too…
Select XML Source/[authorID]/[bookID].xml
Select XSLT Stylesheetpage.html.xsl
Pass [sectionID] to stylesheet as parameter
URL from User Agent (Web Browser) w/ Requested Section/ccel/[authorID]/[bookID].[sectionID].[format]
Current Status
• Still in “beta” stage This new system still has a few bugs,
but it will be going online in a preliminary form very soon
Current Status
• Still in “beta” stage This new system still has a few bugs,
but it will be going online in a preliminary form very soon
As more and more documents are converted to well-formed ThML, they can be used by Cocoon and we will hopefully phase out the old system over time There are SO many documents on the CCEL,
this may take some time
Current Status
• New Server Our new Dell PowerEdge 2500 server
runs Cocoon (being a Java app, its not the most efficient user of system resources)
Current Status
• New Server Our new Dell PowerEdge 2500 server
runs Cocoon (being a Java app, its not the most efficient user of system resources)
We are currently in the process of moving the entire site over to the PowerEdge When this is complete, the new XML-based
system will be available for people to try out
Current Status
• New Server Our new Dell PowerEdge 2500 server runs
Cocoon (being a Java app, its not the most efficient user of system resources)
We are currently in the process of moving the entire site over to the PowerEdge When this is complete, the new XML-based
system will be available for people to try out
Some examples from the new site…