CCEL & XML

20
CCEL & XML A Match Made In (Ethereal) Heaven

description

CCEL & XML. A Match Made In (Ethereal) Heaven. The CCEL. The Christian Classics Ethereal Library Online library of electronic texts. The CCEL. The Christian Classics Ethereal Library Online library of electronic texts Different applications require different formats. The CCEL. - PowerPoint PPT Presentation

Transcript of CCEL & XML

Page 1: CCEL & XML

CCEL & XML

A Match Made In(Ethereal) Heaven

Page 2: CCEL & XML

The CCEL

• The Christian Classics Ethereal Library Online library of electronic texts

Page 3: CCEL & XML

The CCEL

• The Christian Classics Ethereal Library Online library of electronic texts Different applications require different

formats

Page 4: CCEL & XML

The CCEL

• The Christian Classics Ethereal Library Online library of electronic texts Different applications require different

formats Which format is authoritative?

Page 5: CCEL & XML

The CCEL

• The Christian Classics Ethereal Library Online library of electronic texts Different applications require different

formats Which format is authoritative?

• ThML (Theological Markup Language) XML application Master version of most (someday all)

texts on ccel.org

Page 6: CCEL & XML

ThML

• Defined as an XML application several years ago

• Perl scripts and other “hacks” used to transform ThML into HTML (and other formats) for presentation on site Static transformation Error prone Difficult to maintain

Page 7: CCEL & XML

ThML

• Defined as an XML application several years ago

• Perl scripts and other “hacks” used to transform ThML into HTML (and other formats) for presentation on site Static transformation Error prone Difficult to maintain

• Isn’t there a better way?

Page 8: CCEL & XML

XSLT

• XML Stylesheet Language Transformations Stylesheets written in XML that

“explain” to a piece of software called a “transformer” how to change an XML document into a different format (XML or other)

Example: Client-side transformations with Internet Explorer 6

Page 9: CCEL & XML

XSLT on the CCEL

• Summer of 2001 Two students (myself and Jimmy Osborn) wrote

XSLT stylesheets that duplicated the functionality of the Perl script transformations From user’s perspective, the XSLT-transformed docs

look no different from the Perl-transformed docs From developer’s perspective, there is now only one

file for every text in the library (the ThML file), all other formats are generated on-the-fly at the user’s request

Page 10: CCEL & XML

XSLT on the CCEL

• Summer of 2001 Two students (myself and Jimmy Osborn) wrote XSLT

stylesheets that duplicated the functionality of the Perl script transformations From user’s perspective, the XSLT-transformed docs look no

different from the Perl-transformed docs From developer’s perspective, there is now only one file for

every text in the library (the ThML file), all other formats are generated on-the-fly at the user’s request

• How does this work? Client-side transformations are not powerful

enough for our purposes We needed a more powerful, server-side

transformer

Page 11: CCEL & XML

• Apache Cocoon Java webapp (webapps are servlets with an

attitude) Runs inside a Java servlet engine (e.g. Apache

Tomcat), which is then connected to a web server Not just a transformer, but an “XML Publishing

Framework” Brings together of host of different XML technologies for

the purpose of publishing XML docs online Does XSLT transformations, XSL-FO to PDF conversion,

renders SVG graphics, executes eXtensible Server Pages (XSP), integrates with databases, washes your dog and more

Page 12: CCEL & XML

• Sitemap The sitemap is where Cocoon really shows that

it is more than just a kludge of XML-processing programs

Using regexp matching (or simple wildcard, if you’re into that sort of thing…), it maps URLs to an XML document, a stylesheet, and a transformer (and a few other things I won’t get into here) Based on the URL, you can match any XML source doc

with any XSLT stylesheet and then send the result through any transformer you like!

Page 13: CCEL & XML

CCEL Sitemap

Select XML Source/[authorID]/[bookID].xml

Select XSLT Stylesheetthml.[format].xsl

Select Output TransformerHTML/PDF/OEB/TXT/etc.

Resulting Document

URL from User Agent (Web Browser)/ccel/[authorID]/[bookID].[format]

Page 14: CCEL & XML

CCEL Sitemap

• New Problem: People browsing online don’t want the whole text in one big HTML file Cocoon’s sitemap helps us here too…

Page 15: CCEL & XML

CCEL Sitemap

• New Problem: People browsing online don’t want the whole text in one big HTML file Cocoon’s sitemap helps us here too…

Select XML Source/[authorID]/[bookID].xml

Select XSLT Stylesheetpage.html.xsl

Pass [sectionID] to stylesheet as parameter

URL from User Agent (Web Browser) w/ Requested Section/ccel/[authorID]/[bookID].[sectionID].[format]

Page 16: CCEL & XML

Current Status

• Still in “beta” stage This new system still has a few bugs,

but it will be going online in a preliminary form very soon

Page 17: CCEL & XML

Current Status

• Still in “beta” stage This new system still has a few bugs,

but it will be going online in a preliminary form very soon

As more and more documents are converted to well-formed ThML, they can be used by Cocoon and we will hopefully phase out the old system over time There are SO many documents on the CCEL,

this may take some time

Page 18: CCEL & XML

Current Status

• New Server Our new Dell PowerEdge 2500 server

runs Cocoon (being a Java app, its not the most efficient user of system resources)

Page 19: CCEL & XML

Current Status

• New Server Our new Dell PowerEdge 2500 server

runs Cocoon (being a Java app, its not the most efficient user of system resources)

We are currently in the process of moving the entire site over to the PowerEdge When this is complete, the new XML-based

system will be available for people to try out

Page 20: CCEL & XML

Current Status

• New Server Our new Dell PowerEdge 2500 server runs

Cocoon (being a Java app, its not the most efficient user of system resources)

We are currently in the process of moving the entire site over to the PowerEdge When this is complete, the new XML-based

system will be available for people to try out

Some examples from the new site…