Developing an STM DTD/Schema: Strategic design choices

2
Alexander (‘Sasha’) Schwarzman, AGU Extreme Markup Languages 2006, Montréal, Canada Page 1 of 2 ([email protected]) August 7 – 11, 2006 Developing an STM DTD/Schema: Strategic Design Choices Alexander (‘Sasha’) Schwarzman, AGU ([email protected]) Extreme Markup Languages 2006, Montréal, Canada August 7 – 11, 2006 Requirements Does an agreed upon Requirements document exist? (Get one!) What is your XML’s role? Archival copy-of-record (preserving scientific content)? Means of producing a pretty PDF? Both? Much more? Architecture When during production is XML created? How is accuracy checked at each stage? Dummy empty elements for not-yet-assigned metadata plus use of configurable production-stage-specific Business Rules Checker / Validator / QC Tool? Multiple DTDs: a separate one for each production stage? XML “layering”: What “layer” to use for enforcing editorial style and business rules? DTD / parser? Validator / Schematron? Human editors? Revisable unit (what is the elemental unit?) Article? Issue? Arbitrary / cross-journal article collection? Volume / year? Journal? More than one of these? Scope For what material? Current? Future-only? Legacy? All of the above or some combination? What is the extent of an article / book? Does it include supplementary material, like datasets and computable spreadsheets? Do you model “extra stuff” as just another structured section or is it something different? Special links (“related links”) section?

Transcript of Developing an STM DTD/Schema: Strategic design choices

Page 1: Developing an STM DTD/Schema: Strategic design choices

Alexander (‘Sasha’) Schwarzman, AGU Extreme Markup Languages 2006, Montréal, Canada Page 1 of 2 ([email protected]) August 7 – 11, 2006

Developing an STM DTD/Schema: Strategic Design Choices

Alexander (‘Sasha’) Schwarzman, AGU ([email protected]) Extreme Markup Languages 2006, Montréal, Canada

August 7 – 11, 2006

Requirements Does an agreed upon Requirements document exist? (Get one!)

What is your XML’s role?

Archival copy-of-record (preserving scientific content)?

Means of producing a pretty PDF?

Both?

Much more?

Architecture When during production is XML created? How is accuracy checked at each stage?

Dummy empty elements for not-yet-assigned metadata plus use of configurable production-stage-specific Business Rules Checker / Validator / QC Tool?

Multiple DTDs: a separate one for each production stage?

XML “layering”: What “layer” to use for enforcing editorial style and business rules?

DTD / parser?

Validator / Schematron?

Human editors?

Revisable unit (what is the elemental unit?)

Article?

Issue?

Arbitrary / cross-journal article collection?

Volume / year?

Journal?

More than one of these?

Scope For what material?

Current?

Future-only?

Legacy?

All of the above or some combination?

What is the extent of an article / book?

Does it include supplementary material, like datasets and computable spreadsheets?

Do you model “extra stuff” as just another structured section or is it something different?

Special links (“related links”) section?

Page 2: Developing an STM DTD/Schema: Strategic design choices

Developing an STM DTD / Schema: Strategic Design Choices (cont’d)

Alexander (‘Sasha’) Schwarzman, AGU Extreme Markup Languages 2006, Montréal, Canada Page 2 of 2 ([email protected]) August 7 – 11, 2006

Modeling Language Choices Which constraint language is primary?

DTD?

XSD?

RELAX NG?

How many DTDs / schemas (purpose of each)?

Authoring?

Conversion / Transformation?

Production?

Archiving?

Separate or shared: If your content includes journal article, newspaper article, book chapter, book, case study, lecture notes, etc., should you use:

Distinct DTD / schema for each?

A large shared structure?

A DTD / schema suite with common modules?

“Off-the-shelf, Altered-to-fit, or Bespoke?” (T. Usdin)

If altered, what public model?

“compatible with” or “informed by” (subset or superset)?

If bespoke, do you use any public models at all (for tables and math, for instance)?

Modeling Design Choices “Prussian” or “Californian”: prescriptive or descriptive? Flexible or enforcing?

Generated or Explicit text? (depends on XML’s role)

Preserve generation / rendition rules?

Different approach for text and bibliographic references?

How to model bibliographic references?

Mixed content?

Genre-specific “strict models” (with an escape hatch provided)?

“Tag abuse” tolerance?

How to reference non-XML components, e.g., figures, in XML?

By an ID that maps to a set of multiple images in an archive?

By naming a specific file from the set? Which one is “the mother of all images”?

Which components to store / migrate? Is “storing cheaper than thinking”? (D. Lapeyre)

How to model math?

MathML presentation versus content (computation)? How to ensure the identicalness of the same math symbols in different browsers (same UNICODE

codepoints look differently in various browsers, e.g., epsilon and varepsilon)?

LaTeX plus GIFs? How to ensure the identicalness of special characters that occur both in a displayed formula and

inline?

Just GIFs?

“Just because you can, doesn’t mean you should” (D. Lapeyre)

The lure of modeling for its own sake. Simplicity maintains better over time