BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is...

16
BizTalk Flat File Parsing Annotations

Transcript of BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is...

Page 1: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

BizTalk Flat File ParsingAnnotations

Page 2: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

Flat File Parsing = LL(k) Parser

The flat file parser is entirely grammar driven and is implemented as an LL(k) Parser or Look-ahead LL Parser. Schema’s are translated into a grammar which is then translated into tables which are used during parsing.

http://en.wikipedia.org/wiki/LL_parser The FFParser for BTS2004 is a streaming parser. Source of information = newsgroups, David Downing! Manual annotations = attributes of the

/annotation/appinfo/schemaInfo element

Page 3: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

suppress_empty_nodes="true|false"

Removes empty nodes from the XML stream. This can be used to eliminate fields that are empty after being parsed, but the XSD type doesn't allow empty values.

Default = false

Page 4: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

suppress_empty_nodes="true|false"

Sample

false <Root> <Record1 Field1="Field1" Field2="Field2" Field3="Field3"></Record1> <Record1 Field1="Field1" Field2="" Field3="Field3"></Record1> <Record1 Field1="Field1" Field2="Field2" Field3=""></Record1> </Root>

true <Root> <Record1 Field1="Field1" Field2="Field2" Field3="Field3" /> <Record1 Field1="Field1" Field3="Field3" /> <Record1 Field1="Field1" Field2="Field2" /> </Root>

Field1-Field2-Field3Field1--Field3Field1-Field2-

Page 5: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

generate_empty_nodes="true|false“

SerializationGenerate empty nodes for records that

exist in the XML instance data. Adds missing fields to records in the

XML stream. This allows positional records to line up correctly and place delimiters in delimited records.

Default = true

Page 6: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

generate_empty_nodes="true|false“Sample positional

(Record1/Field1,Field2,Field3 each 3pos) <Root> <Record1 Field1="AAA" Field3="CCC"/> </Root>

true AAA CCC false AAACCC

Page 7: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

generate_empty_nodes="true|false“Sample delimited

(Record1/Field1,Field2,Field3 delimited infix ‘comma’) <Root> <Record1 Field1="AAA" Field3="CCC"/> </Root>

true AAA ,,CCC false AAA,CCC

Page 8: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

allow_early_termination="true|false"

When it comes to positional records, because this is a grammar driven parser, delimiters encountered during a parse of the positional record are not treated as delimiters, rather they are treated as part of the current field of the positional record that is being parsed.

Used to allow the right-most positional field to be treated as a delimited field (ie can be shorter or longer than specified by the pos_length setting).

Only the right-most positional field is allowed to early terminate.Right-most = starting from a positional record the right most child of the record as it's being parsed from left to right. (This includes positional child records of the parent positional record. parser_optimization="complexity" and allow_early_termination="true“-> you effectively change the right-most positional field into a delimited field thus allowing this field to early terminate.)

Page 9: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

allow_early_termination="true|false“ Sample

1. Ok:AAABBBCCC(0x0D 0x0A)AAABBBCCC(0x0D 0x0A)

2. Nok:AAABBBCC(0x0D 0x0A)AAABBBCCC(0x0D 0x0A)

allow_early_termination="true”2. Nok now Ok

Page 10: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

parser_optimization="complexity|speed"

Contols the grammar generated from the schema used to parse the flat file document.

The complexity setting produces a more complex grammar and can be used to parse records that have complex nested optional children. Although the parser is much more flexible in the data that can be parsed using this setting, it does so at the expense of the speed in which the data can be parsed, and not all data layouts can be successfully parsed.

The speed setting optimizes for speed, and is limited in the complexity of the data that can be parsed.

When parser_optimization is set to complexity, you may have validation failures against a schema when there are many optional nodes in the same group or record. You may need to set lookahead_depth to zero (0) to avoid validation errors.

Default = speed

Page 11: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

parser_optimization="complexity|speed“Sample

ms-help://BTS_2004/SDK/htm/ebiz_prog_pipe_vhtb.htm

Sample from BizTalk help (do not forget to put lookahead_depth="0")

<schema> Root ("," prefix) Field1 opt Field2 opt Field3 opt Field4 opt Record1 ("," infix) Field5 Field6 </schema>

Page 12: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

parser_optimization=“complexity“Sample

instance:,1,2,3,4 Output (Record1 mandatory): <parser_optimization_complexity Field1="1"

Field2="2"><Record1 Field5="3" Field6="4"></Record1></parser_optimization_complexity>

Output (change Record1 optional – minoccurs=0): < parser_optimization_complexity Field1="1"

Field2="2" Field3="3" Field4="4“/>

Page 13: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

parser_optimization=“speed“Sample

instance:,1,2,3,4 Output (Record1 optional – minoccurs=0): <parser_optimization_speed Field1="1" Field2="2"

Field3="3" Field4="4"></parser_optimization_speed>

Output (change Record1 to mandatory): Parsing Error!

Page 14: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

parser_optimization="complexity|speed“Conclusion

Complexity setting: parsing engine uses both top-down and bottom-up parsing

Speed setting: parsing engine uses top-down only

Page 15: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

lookahead_depth="nn"

The lookahead_depth setting can be used to instruct how far the parser will attempt to lookahead when matching data. The lookahead_depth refers to how far ahead you look in the parsing token stream to make a parsing prediction.  

0 means infinite lookahead. The higher the number the more expensive the processing will be to

locate matches during the parse. Ideal: evaluate lookahead_depth from infinite (0) to the minimum value

and then add 2. Because the parser is grammar driven and the grammar goes through

several transformations from the schema before becoming a grammar, there is no way of correlating it back to the schema itself.

Default = 3 The lookahead_depth applies to the speed mode as well, but because

the generated grammar in speed mode is much less complex, it's more difficult to create a scenario where it actually does demonstrate this.

Higher lookahead_depth = more memory consumption

Page 16: BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)

lookahead_depth="nn“Sample

Instance(3 fields): Field1+Field1+Field1+ Result (depth=2): Missing data! Only 2 fields in XML <lookahead_depth_low><Record1 Field1="Field1/><Record2><Record3

Field1="Field1“/></Record2></lookahead_depth_low> Result (depth=3 or depth=0): Ok All 3 fields in XML <lookahead_depth_infinite><Record1 Field1="Field1"/><Record2><Record3

Field1="Field1“/><Record4 Field1="Field1“/> </Record2></lookahead_depth_infinite>