BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is...
-
Upload
sabrina-evans -
Category
Documents
-
view
216 -
download
2
Transcript of BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is...
BizTalk Flat File ParsingAnnotations
Flat File Parsing = LL(k) Parser
The flat file parser is entirely grammar driven and is implemented as an LL(k) Parser or Look-ahead LL Parser. Schema’s are translated into a grammar which is then translated into tables which are used during parsing.
http://en.wikipedia.org/wiki/LL_parser The FFParser for BTS2004 is a streaming parser. Source of information = newsgroups, David Downing! Manual annotations = attributes of the
/annotation/appinfo/schemaInfo element
suppress_empty_nodes="true|false"
Removes empty nodes from the XML stream. This can be used to eliminate fields that are empty after being parsed, but the XSD type doesn't allow empty values.
Default = false
suppress_empty_nodes="true|false"
Sample
false <Root> <Record1 Field1="Field1" Field2="Field2" Field3="Field3"></Record1> <Record1 Field1="Field1" Field2="" Field3="Field3"></Record1> <Record1 Field1="Field1" Field2="Field2" Field3=""></Record1> </Root>
true <Root> <Record1 Field1="Field1" Field2="Field2" Field3="Field3" /> <Record1 Field1="Field1" Field3="Field3" /> <Record1 Field1="Field1" Field2="Field2" /> </Root>
Field1-Field2-Field3Field1--Field3Field1-Field2-
generate_empty_nodes="true|false“
SerializationGenerate empty nodes for records that
exist in the XML instance data. Adds missing fields to records in the
XML stream. This allows positional records to line up correctly and place delimiters in delimited records.
Default = true
generate_empty_nodes="true|false“Sample positional
(Record1/Field1,Field2,Field3 each 3pos) <Root> <Record1 Field1="AAA" Field3="CCC"/> </Root>
true AAA CCC false AAACCC
generate_empty_nodes="true|false“Sample delimited
(Record1/Field1,Field2,Field3 delimited infix ‘comma’) <Root> <Record1 Field1="AAA" Field3="CCC"/> </Root>
true AAA ,,CCC false AAA,CCC
allow_early_termination="true|false"
When it comes to positional records, because this is a grammar driven parser, delimiters encountered during a parse of the positional record are not treated as delimiters, rather they are treated as part of the current field of the positional record that is being parsed.
Used to allow the right-most positional field to be treated as a delimited field (ie can be shorter or longer than specified by the pos_length setting).
Only the right-most positional field is allowed to early terminate.Right-most = starting from a positional record the right most child of the record as it's being parsed from left to right. (This includes positional child records of the parent positional record. parser_optimization="complexity" and allow_early_termination="true“-> you effectively change the right-most positional field into a delimited field thus allowing this field to early terminate.)
allow_early_termination="true|false“ Sample
1. Ok:AAABBBCCC(0x0D 0x0A)AAABBBCCC(0x0D 0x0A)
2. Nok:AAABBBCC(0x0D 0x0A)AAABBBCCC(0x0D 0x0A)
allow_early_termination="true”2. Nok now Ok
parser_optimization="complexity|speed"
Contols the grammar generated from the schema used to parse the flat file document.
The complexity setting produces a more complex grammar and can be used to parse records that have complex nested optional children. Although the parser is much more flexible in the data that can be parsed using this setting, it does so at the expense of the speed in which the data can be parsed, and not all data layouts can be successfully parsed.
The speed setting optimizes for speed, and is limited in the complexity of the data that can be parsed.
When parser_optimization is set to complexity, you may have validation failures against a schema when there are many optional nodes in the same group or record. You may need to set lookahead_depth to zero (0) to avoid validation errors.
Default = speed
parser_optimization="complexity|speed“Sample
ms-help://BTS_2004/SDK/htm/ebiz_prog_pipe_vhtb.htm
Sample from BizTalk help (do not forget to put lookahead_depth="0")
<schema> Root ("," prefix) Field1 opt Field2 opt Field3 opt Field4 opt Record1 ("," infix) Field5 Field6 </schema>
parser_optimization=“complexity“Sample
instance:,1,2,3,4 Output (Record1 mandatory): <parser_optimization_complexity Field1="1"
Field2="2"><Record1 Field5="3" Field6="4"></Record1></parser_optimization_complexity>
Output (change Record1 optional – minoccurs=0): < parser_optimization_complexity Field1="1"
Field2="2" Field3="3" Field4="4“/>
parser_optimization=“speed“Sample
instance:,1,2,3,4 Output (Record1 optional – minoccurs=0): <parser_optimization_speed Field1="1" Field2="2"
Field3="3" Field4="4"></parser_optimization_speed>
Output (change Record1 to mandatory): Parsing Error!
parser_optimization="complexity|speed“Conclusion
Complexity setting: parsing engine uses both top-down and bottom-up parsing
Speed setting: parsing engine uses top-down only
lookahead_depth="nn"
The lookahead_depth setting can be used to instruct how far the parser will attempt to lookahead when matching data. The lookahead_depth refers to how far ahead you look in the parsing token stream to make a parsing prediction.
0 means infinite lookahead. The higher the number the more expensive the processing will be to
locate matches during the parse. Ideal: evaluate lookahead_depth from infinite (0) to the minimum value
and then add 2. Because the parser is grammar driven and the grammar goes through
several transformations from the schema before becoming a grammar, there is no way of correlating it back to the schema itself.
Default = 3 The lookahead_depth applies to the speed mode as well, but because
the generated grammar in speed mode is much less complex, it's more difficult to create a scenario where it actually does demonstrate this.
Higher lookahead_depth = more memory consumption
lookahead_depth="nn“Sample
Instance(3 fields): Field1+Field1+Field1+ Result (depth=2): Missing data! Only 2 fields in XML <lookahead_depth_low><Record1 Field1="Field1/><Record2><Record3
Field1="Field1“/></Record2></lookahead_depth_low> Result (depth=3 or depth=0): Ok All 3 fields in XML <lookahead_depth_infinite><Record1 Field1="Field1"/><Record2><Record3
Field1="Field1“/><Record4 Field1="Field1“/> </Record2></lookahead_depth_infinite>