Automated QA of DITA content sets

24
Automated QA of DITA Content Sets Ben Colborn Sr. Manager, Technical Publications, Nutanix

Transcript of Automated QA of DITA content sets

Page 1: Automated QA of DITA content sets

Automated QA of DITA Content Sets

Ben ColbornSr. Manager, Technical Publications, Nutanix

Page 2: Automated QA of DITA content sets

“[The Machine] is a universal educator, surely raising the level of human intelligence. …Every age has done its work … with the best tools or contrivances it knew, the tools most successful in saving the most precious thing in the world—human effort.”—Frank Lloyd Wright, “The Art and Craft of the Machine”

Page 3: Automated QA of DITA content sets

What is quality?

Page 4: Automated QA of DITA content sets

First order of quality: ROT vs. RAT

RedundantObsoleteTrivial

RelevantAccurateTimely

Page 5: Automated QA of DITA content sets

Second order: Surface featuresWhat can be discerned by an editor• General writing conventions• Organizational conventions• Domain conventions• Information types• Grammaticality• Terminology

Page 6: Automated QA of DITA content sets
Page 7: Automated QA of DITA content sets

Level Definition Automatable

Coordination Manuscript handling, job monitoring and control Policy Ensuring that a publication reflects the policy of the

organization Integrity Ensuring that parts of a publication match Screening Spelling, S-V agreement Copy clarification

Clarifying illegible text, preparing graphics Format Ensuring conformity with format Mechanical style

Checking capitalization, abbreviations, use of numbers, consistency of spelling, organizational terminology

Language Checking grammar, usage, parallelism, conciseness Substantive Ensuring that the necessary content for the intended scope

is present

Which levels of edit can be automated?

Page 8: Automated QA of DITA content sets

Automate what you can to free people to do what computers can’t!

Page 9: Automated QA of DITA content sets

Division of laborHuman Machine• Coordination• Policy (mostly)• Copy clarification• Substantive

• Integrity: validate and check for completeness

• Screening: spell checker, grammar checker

• Format: schema-driven authoring, automatic stylesheet application

• Mechanical style: QA plugin• Language: Acrolinx and

aspirants

Page 10: Automated QA of DITA content sets

Mechanical style example• MMSTP prohibits “click on”• How reliably will a computer find all occurrences of

“click on” in 500 pages of content? How long will it take?• How reliably will a person? How long will it take?• What tasks of higher impact could the person have

done in the same time?• How will the person feel after making this attempt?• What about when there are 100 rules and not just one?

Page 11: Automated QA of DITA content sets

Approaches

Terminology Markup• Acrolinx• Shared dictionary• String-matching script

• Constraints• Schematron• XPath-matching script

Ditanauts QA plugin

Page 12: Automated QA of DITA content sets

Spelling/grammar check Editorial/peer

review

QA plugin

Three legs of a QA process

Page 13: Automated QA of DITA content sets

Ditanauts QA plugin

Page 14: Automated QA of DITA content sets

Overview• Freely available on GitHub• Customization of HTML Open Toolkit Plugin• Checks for the occurrence of XPath expressions• Creates a report DB (for customization) and user-

readable reports

Page 15: Automated QA of DITA content sets

Process1. Compile terminology checks into XPath expressions2. Check each topic for the occurrence of user-configured

XPath expressions3. Write a database file (DITA topic) listing each topic

with the found violations and other metadata4. Write user-readable reports: quality summary,

DITAMAP, CSV

Page 16: Automated QA of DITA content sets

Input: XPath expressions

Page 17: Automated QA of DITA content sets

Input: Expression compiler

Page 18: Automated QA of DITA content sets

ExecutionOT 1.x

> ant -Dtranstype=qa -Douter.control=quiet \-Dargs.input=samples/taskbook.ditamap \-Dsetchunk=true

OT 2.x> dita -f qa -i samples/taskbook.ditamap \

-Dsetchunk=true

Page 19: Automated QA of DITA content sets

Output: Database file

Page 20: Automated QA of DITA content sets
Page 21: Automated QA of DITA content sets

Output: CSV

Page 22: Automated QA of DITA content sets

Output: DITAMAP

Page 23: Automated QA of DITA content sets

Best practices• Keep the list of violations short.• Only include violations that are likely to occur in your content set.• Only include violations that are impactful.• Only include rules that are systematically violated.• Update the violations list over time.• Carefully craft checks to avoid false positives.• Provide a specific resolution for each violation.• Use @class rather than element names in the XPath expressions.• Designate a project team member to run the QA routine and

follow up on resolution.