Automated QA of DITA content sets

Post on 08-Jan-2017

95 views 0 download

Transcript of Automated QA of DITA content sets

Automated QA of DITA Content Sets

Ben ColbornSr. Manager, Technical Publications, Nutanix

“[The Machine] is a universal educator, surely raising the level of human intelligence. …Every age has done its work … with the best tools or contrivances it knew, the tools most successful in saving the most precious thing in the world—human effort.”—Frank Lloyd Wright, “The Art and Craft of the Machine”

What is quality?

First order of quality: ROT vs. RAT

RedundantObsoleteTrivial

RelevantAccurateTimely

Second order: Surface featuresWhat can be discerned by an editor• General writing conventions• Organizational conventions• Domain conventions• Information types• Grammaticality• Terminology

Level Definition Automatable

Coordination Manuscript handling, job monitoring and control Policy Ensuring that a publication reflects the policy of the

organization Integrity Ensuring that parts of a publication match Screening Spelling, S-V agreement Copy clarification

Clarifying illegible text, preparing graphics Format Ensuring conformity with format Mechanical style

Checking capitalization, abbreviations, use of numbers, consistency of spelling, organizational terminology

Language Checking grammar, usage, parallelism, conciseness Substantive Ensuring that the necessary content for the intended scope

is present

Which levels of edit can be automated?

Automate what you can to free people to do what computers can’t!

Division of laborHuman Machine• Coordination• Policy (mostly)• Copy clarification• Substantive

• Integrity: validate and check for completeness

• Screening: spell checker, grammar checker

• Format: schema-driven authoring, automatic stylesheet application

• Mechanical style: QA plugin• Language: Acrolinx and

aspirants

Mechanical style example• MMSTP prohibits “click on”• How reliably will a computer find all occurrences of

“click on” in 500 pages of content? How long will it take?• How reliably will a person? How long will it take?• What tasks of higher impact could the person have

done in the same time?• How will the person feel after making this attempt?• What about when there are 100 rules and not just one?

Approaches

Terminology Markup• Acrolinx• Shared dictionary• String-matching script

• Constraints• Schematron• XPath-matching script

Ditanauts QA plugin

Spelling/grammar check Editorial/peer

review

QA plugin

Three legs of a QA process

Ditanauts QA plugin

Overview• Freely available on GitHub• Customization of HTML Open Toolkit Plugin• Checks for the occurrence of XPath expressions• Creates a report DB (for customization) and user-

readable reports

Process1. Compile terminology checks into XPath expressions2. Check each topic for the occurrence of user-configured

XPath expressions3. Write a database file (DITA topic) listing each topic

with the found violations and other metadata4. Write user-readable reports: quality summary,

DITAMAP, CSV

Input: XPath expressions

Input: Expression compiler

ExecutionOT 1.x

> ant -Dtranstype=qa -Douter.control=quiet \-Dargs.input=samples/taskbook.ditamap \-Dsetchunk=true

OT 2.x> dita -f qa -i samples/taskbook.ditamap \

-Dsetchunk=true

Output: Database file

Output: CSV

Output: DITAMAP

Best practices• Keep the list of violations short.• Only include violations that are likely to occur in your content set.• Only include violations that are impactful.• Only include rules that are systematically violated.• Update the violations list over time.• Carefully craft checks to avoid false positives.• Provide a specific resolution for each violation.• Use @class rather than element names in the XPath expressions.• Designate a project team member to run the QA routine and

follow up on resolution.