Post on 18-Feb-2017
06:42
Planning and implementing conversion of legacy files to XML/DITA
compl iancy
Bernard Aschwanden
www.publ ishingsmarter.com
bernard@pub l i sh ingsmar te r. com
Migrating to XML withFrameMaker Conversion
Tables1
@publishsmarter
06:42
The agenda
@publishsmarter
2
Convert content from unstructured to structuredEDD, conversion table, and a structured templateUsing basic examples to get you started, this
session: Convert files with content such as character tags and
paragraph tags Add support for images and tables Demo converting unstructured to structured using
conversion tablesSamples are easy to recreate, but complex and
powerful in functionality
06:42
Housekeeping and note taking
@publishsmarter
3Not all slides or topics are
equally weightedUse some, discard othersSlide speed varies as this is
a QUICK sessionQuestions? Ask along the
way!
I’d love to claim errors/typos is on purpose… they isn’t, ain’t, and weren’t never; however, I’ll fix ‘em as I can…
06:42
About your speaker
@publishsmarter
4Publishing Smarter: PresidentContent strategist, publishing
technologies expert, author, and geek-enough
Certified Technical Trainer DITA Content management Topic-based writing
Society for Technical Communication (www.stc.org) President STC Associate Fellow
06:42
Standard disclaimer
@publishsmarter
5
In the interest of brevity I will make some blanket statements to keep it simple
It’s not all 100% “the truth”, but I’ll stay close
Purists may complain And they are wrong! (except when they are
right)
06:42
Major disclaimer
@publishsmarter
6
This is a quick sessionThere are LOTS of
samples in slides or FrameMaker
Simple samplesStill complex ideasTricky to set things up
Happy to share files
To review/apply this Watch the recording Jot down “time
stamps” Cool item at 17:23 Excel formulas 18:57 Word updates 26:33
Then watch it again Pause it, rewind, try it Do this at your own
pace Slowly test with your
content
06:42@publishsmarter
Before you structure content
7
06:42
Legacy content and document review
@publishsmarter
8
Include analysis of legacy files Identify what can stay and what needs to go. Approach with flexibility
What structure to use? Decide on the overall structural environment you want
to work with Could include S1000D, DocBook and DITA Can also build your own
Develop your FrameMaker support materials EDD, conversion table, and a template at the least
06:42
ID a rule set
@publishsmarter
9
Use existing rules If rules already exist, you have a solid starting point Learn the rules and adapt your content to them
Build your own rules If no rules exist, you can set your own from the start Learn how to create the rules and build all the components
Hybrid approach If you see a set of rules that look promising, learn about
them Find out how you need to adapt your content to match the
rules If that does not work, then consider adapting the rules
06:42
Not chaos, but it’s at least unstructured and needs work
@publishsmarter
Create structure from chaos10
06:42
Method 1—Manually, element by element
@publishsmarter
11
Apply structural rules to your contentManually wrap content such as text ranges
and tablesContinue to manually wrap contents of
paragraphs together in Para elementsThen wrap sequences of Head and Para
elements in Section elementsAnd so on until entire document is wrapped in
single highest-level element
06:42
Method 2—Automatically
@publishsmarter
12
Similar to adding structure manually Apply rules to document objects below paragraph level Then at paragraph level and through successively higher
levels Stops at root element, or no more rules exist
Automatic wrapping requires a conversion table Provides table of mappings to automate task of adding
structure to unstructured documents Uses paragraph and character tags, and object types (such
as equations or footnotes), to identify how to wrap document components in elements
Also specifies how to wrap child elements in parent elements
06:42
Let’s dive into it
@publishsmarter
Conversion tables13
06:42
Conversion Table—Overview
@publishsmarter
14
Conversion table: rules for mapping content in unstructured files to structured content. Conversion table can be split up into several tables
with text or graphics in between for comments Cannot have any tables other than conversion tables Must be saved at least once before it can be used Allows for iterative testing though Can be in structured or unstructured document
06:42
Conversion Table—Organization
@publishsmarter
15
Organization of conversion table: Regular table, with at least 3 columns and 1 body row Additional columns and heading/footing rows can hold
comments Each body row holds 1 ruleColumn 1 Column 2 Column 3specifies document object, child element, or sequence to wrap
specifies element in which to wrap
specifies optional qualifier (“nickname”) to use as temporary label
06:42
Conversion Table—Sample
@publishsmarter
16
Wrap this object In this element With this qualifierP:Bullet Item UnorderedP:Numbered Item Ordered
Column 1 Column 2 Column 3specifies document object, child element, or sequence to wrap
names the element in which to wrap
specifies optional qualifier (“nickname”) to use as temporary label
06:42
Manually or automatical ly
@publishsmarter
Ways to create conversion tables
17
06:42
Conversion Table Production: Manual
@publishsmarter
18
You have full control. No automatically inserted content. All the rules are specific to what you tell the system. However, you have to be explicit. (I am not a fan)Wrap this object In this element With this qualifierP:Head1 Head1P:Head2 Head2P:Body BodyP:Code CodeSV:Current Date CurrentDateC:Code cCodeTC: CELLTR: ROW
06:42
Conversion Table Production: Automatic
@publishsmarter
19
Autogenerated content , then develop more rules or tweak as needed. Rules based on content used in source files. (I like this a lot more)
Use if you already have an unstructured document Scans body page flows to ID every object that can be structured Lists object type and format tag (if any) used in document Maps object to element Element tag named same as format tag If object does not have format, element tag is a default name
for example: CELL or BODY Removes parentheses and other characters to create valid element tag
Object type identifier in lowercase is prepended to duplicate tagsDeveloper adds additional rules to:
Wrap elements in higher-level elements Set attributes as elements are created Wrap all elements in root element (by using root RE or by making elements wrap
up properly)
06:42
Number of conversion tables you need
@publishsmarter
20
Based on types of high level elements and amount/quality of content
If documents are clear and short, with a single highest level Create unique conversion table for each document type and convert in bulk
For example if your documents are already clearly defined as task, reference or concept you can apply one of three conversion tables to groups of files
If documents are clear, but long and with multiple highest level Create a single conversion table that covers as much as possible and then
divide up content as required, or; Reorganize first, then you have clear, short files with one highest level
If documents are scattered with content Create a single conversion table that does initial work and then manually
rework the structure as needed, or; Rewrite and reorganize first to have clear, short files with one highest level
06:42@publishsmarter
Generate conversion tables21
06:42
Your first conversion table
@publishsmarter
22
1. Open document with objects you want to structure
2. Structure Tools > Generate Conversion Table3. From the Generate Conversion Table dialog
box, select Generate New Conversion Table4. Click Generate
06:42
Expected results
@publishsmarter
23
Unnamed conversion table appears with rules based on objects in document and element tags based on format tags (tags used in the file, not all in catalog)Wrap this object In this element With this qualifierP:Title TitleP:Body BodyP:Heading1 Heading1P:Heading2 Heading2P:Heading3 Heading3C:Emphasis EmphasisX:See Heading SeeHeadingM:Index IndexM:Cross-Ref Cross-Ref
06:42
Update a conversion table
@publishsmarter
24
Do so for a more complete list of objects (for example, after a chapter is parsed, a more complete one is found)1. Open document with objects you want to structure2. Structure Tools > Generate Conversion Table3. From the Generate Conversion Table dialog box,
select Update Conversion Table4. From Update Conversion Table popup menu, choose a
previously saved and open conversion table to update
5. Click Generate
06:42@publishsmarter
Rules to be aware of25
06:42
Rule Syntax—Character Restrictions
@publishsmarter
26
Case-Sensitivity in Tags Format and element tags must be specified as defined in catalogs Qualifier tags are case-sensitive; two occurrences of one qualifier
must match exactlySpecial characters in Tags include ( ) & | , * + ? % [ ] : \
In format tags and qualifier tags—allowed but must be preceded by backslash (\) in table
In element tags—not allowedA space character in tags does not need to be preceded
with backslash (you can write tag Format A)Wildcard character (%) in Tags
Use % as in format or element tag to match zero, one, or more characters (similar to * in general rule)(you can write P:%Body matches paragraphs with format tag Body, FirstBody, or BulletBody)
06:42
Rule Syntax—Specifying What to Wrap
@publishsmarter
27
In Column 1 of the Conversion Table 1 or 2 letter
code to ID item type
Type format name to narrow definitions
Object Code Additional Info After Code ExampleParagraph P: Paragraph format tag P:BodyText range C: Character format tag C:EmphasisTable T: Table format tag T:Format ATable title TT: (none) TT:Table heading TH: (none) TH:Table body TB: (none) TB:Table row TR: (none) TR:Table cell TC: (none) TC:System variable SV: Variable format name SV:Current DateUser variable UV: Variable format name UV:CompanyNameGraphic G: (none) G:Footnote F: Location of footnote3 F:FlowMarker M: Marker type M:IndexCross-reference X: Cross-reference format X:Heading OnlyText Inset TI: (none) TI:
Equation Q:Equation size: Small, Medium, or Large Q:Medium
06:42
Rule Syntax—Specifying the Wrapper
@publishsmarter
28
In Column 2 of the conversion table Type object
identifier E: (optional)
Followed by element tag
Wrap this object In this element With this qualifierP:Body ParaC:ReportName ReportT:Format Part PartsTableTT: TableTitleTH: TableHeadingTB: TableBodyTR: PartsRowTC: PartNameSV:Current Date \(Long\) DateUV:Customer CustomerG: GraphicF:Flow FootnoteM:Index IndexEntryX:ElemNumTextPage XRefTI: ParaQ:Large EQ
06:42
Rule Syntax—Specifying a Qualifier
@publishsmarter
29
In Column 3, type qualifier (optional) for new element tag
Wrap this object In this element With this qualifierP:Bullet Item BulletP:StepRestart Item Step1P:Step Item Step
06:42
Rule Syntax—Identifying Sequence to Wrap
@publishsmarter
30
In Column 1 of the conversion table Type E: for
element, then the element tag
Type qualifier (optional) in brackets
Add more element tags with code identifiers and connectors (as in EDD)
Symbol MeaningPlus sign (+) Item is required and can occur more than onceQuestion mark (?) Item is optional and can occur onceAsterisk (*) Item is optional and can occur more than onceComma (,) Items must occur in order givenVertical bar (|) Any one of items in sequence can occurParentheses Beginning and end of sequence
Wrap this object In this element With this qualifierP:Bullet Item BulletP:StepRestart Item Step1P:Step Item StepE:Item[Bullet]+ ListE:Item[Step1], E:Item[Step]+ ListE:Head, (Para | List)+ Section
06:42
Rule Syntax—Adding Attributes to Elements
@publishsmarter
31
Optional in Column 2 of the Conversion Table Type attribute name and value in brackets after
element tag Separate name and value with equal sign, and enclose
value in double quotation marksWrap this object In this element With this qualifierP:Bullet Item BullP:StepRestart Item Step1P:Step Item StepE:Item[Bull]+ List [Type = “Bulleted”]E:Item[Step1], E:Item[Step]+ List [Type = “Numbered”]E:Head, (Para | List)+ Section
06:42
Rule Syntax—Promoting Anchored Object
@publishsmarter
32
When user adds structure to document, table or graphic becomes child of paragraph with anchor
FrameMaker can break table or graphic out of its paragraph and promote element to be sibling of paragraphs:
In Column 2: Type element tag for table or graphic Add keyword “promote” in parentheses after element
tagWrap this object In this element With this qualifierT:Format A ProcedureTable (promote)
06:42
Rule Syntax—Flagging Format Overrides
@publishsmarter
33
Provides a valuable set of elements related to instances when the Paragraph or Character Designer was used to make formatting changes without saving to catalog format. This adds an attribute called Override with value Yes.
In Column 1: Add rule “flag paragraph format overrides” Add rule “flag character format overrides”Wrap this object In this element With this qualifierflag paragraph format overridesflag character format overrides
06:42
Rule Syntax—Wrapping Untagged Text
@publishsmarter
34
To wrap untagged formatted text: In Column 1, add rule “untagged character formatting” In Column 2, add element tagWrap this object In this element With this qualifieruntagged character formatting UntaggedText
06:42
Structuring a file (or set of files) with a conversion table
@publishsmarter
Converting files35
06:42
Procedure: Structuring Current Unstructured Docs
@publishsmarter
36
1. Open conversion table and unstructured document2. In unstructured doc, import element definitions from existing
structured template or EDD Makes elements available in Element Catalog If you do not perform this step, next steps produce elements in Element
Catalog defined by rules specified in conversion table Can always import element definitions after generating structure
3. In unstructured file, StructureTools > Utilities > Structure Current Document
4. From Conversion Table Document popup menu, choose open conversion table file
5. Click Add Structure. A new document appears with content wrapped into elements as defined in rules of conversion table
6. Validate, correct errors, save file
06:42
Procedure: Structuring Group of Unstructured Files
@publishsmarter
37
1. Place files to convert in separate directory2. Open a conversion table file3. StructureTools > Utilities > Structure Documents and
the Structure Documents dialog box appears4. From Conversion Table Document popup menu, choose
the conversion table5. Under Input Unstructured Files, set directory to
structure6. Optionally, if files have unique extension, in Suffix text
box, type extension (otherwise, all files in directory will be structured)
7. Under Output Structured Files, set directory to write to
06:42
(continued)
@publishsmarter
38
1. Turn on Allow Existing Files to Be Overwritten As documents are structured, resulting files might have same names as
some existing files in directory specified for storing structured files When on, overwrites older versions When off, skips over files with existing matching filenames and presents
log file2. Click Add Structure3. When the “Operation completed normally” alert appears, click
OK to dismiss alert (structured files appear in output directory with filenames matching those in input directory)
4. Open each file and import element definitions from any existing structured template or EDD (makes elements in Element Catalog match those in structured template or EDD)
5. Validate, correct errors, save files
06:42
Structuring book documents with a conversion table
@publishsmarter
Converting books39
06:42
Procedure: Structuring Unstructured Book
@publishsmarter
40
1. Open saved conversion table file2. Open unstructured book3. In unstructured book, import element definitions
from any structured template or EDD Makes elements available in Element Catalog If you do not perform this step, next steps produce elements
in Element Catalog defined by rules specified in conversion table
Can always import element definitions after generating structure
4. Select StructureTools > Utilities > Structure Current Book (the Structure Book dialog box appears)
06:42
(continued)
@publishsmarter
41
1. From Conversion Table Document popup menu, choose saved conversion table file
2. In Output Directory text box, type directory for saving structured files or choose from Browse
3. Turn on Allow Existing Files to Be Overwritten As you add structure to documents, resulting files might have same names
as some existing files in specified directory for storing structured files When on, overwrites older versions When off, skips over files with existing matching filenames and presents
log file4. Click Add Structure (structured book and files appear in output
directory with filenames matching those in input directory)5. Validate, correct errors, save
06:42
Summing up the discussion,and options to continue it.
@publishsmarter
42
Conclusion and contact
06:42@publishsmarter
43
About this session
Convert content from unstructured to structuredEDD, conversion table, and a structured templateUsing basic examples to get you started, this
session: Convert files with content such as character tags and
paragraph tags Add support for images and tables Demo converting unstructured to structured using
conversion tablesSamples are easy to recreate, but complex and
powerful in functionality
06:42
My request
@publishsmarter
44
Please suggest this session to othersIf there are any problems with slides, please
let me knowRemember my disclaimer at the beginning
Not all slides are equal: Use some, discard others In the interest of brevity I make some blanket
statements It’s not all 100% “the truth”, but I’ll stay close Purists may complain
And they are wrong! (except when they are right)
06:42
Solving business problems through communication
@publishsmarter
45
06:42
Follow up contact information
@publishsmarter
46
905 833 8448 (Eastern Time)
bernard@publishingsmarter.com
www.linkedin.com/in/bernardaschwanden
@publishsmarter
www.publishingsmarter.com