Describing XML Wrappers for Information Integration Research project XRAKE November 15, 2001.
-
Upload
jeffrey-palmer -
Category
Documents
-
view
220 -
download
0
Transcript of Describing XML Wrappers for Information Integration Research project XRAKE November 15, 2001.
November 15, 2001 Research project XRAKE 2
Research project XRAKE
Merja Ek, Heli Hakkarainen, Pekka Kilpeläinen, Eila Kuikka, and Tommi Penttinen
University of KuopioDepartment of Computer Science
and Applied Mathematics
November 15, 2001 Research project XRAKE 3
Content
• Research project XRAKE
• Introduction
• General ideas of XW
• Examples
• Implementation & Future
November 15, 2001 Research project XRAKE 5
Content
• Research project XRAKE
• Introduction
• General ideas of XW
• Examples
• Implementation & Future
6
Introduction
• programming error-prone, tedious• XW
– declarative– serialised data– influenced by XML Schema, XSLT
• XW wrapper– well-formed XML– highly readable
November 15, 2001 Research project XRAKE 7
Content
• Research project XRAKE
• Introduction
• General ideas of XW
• Examples
• Implementation & Future
November 15, 2001 Research project XRAKE 8
General ideas of XW
• XW can– remove data items
– add structure
– remove structure
• XW cannot– change order of data
• crude transformation– followed by e.g. XSLT
AA x1x2
BBy1 y2
z1 z2
<part-a> <e1>x1</e1> <e2>x2</e2></part-a><part-b> <line-1> <d1>y1</d1> <d2>y2</d2> </line-1> <d3>z2</d3></part-b>
XWengine
XW wrapperspecification
XSLT
November 15, 2001 Research project XRAKE 9
General ideas of XW (cont'd)
• wrapper is a template for output– element names– structure input structure
• input hierarchically divided into parts
• part ~ element• part + subparts ~ element
+ child elements
<the-whole …><part-X …>
<part-Y …>
<subpart-1 …><subpart-2 …><subpart-3 …>
<subpart-1 …><subpart-2 …>
November 15, 2001 Research project XRAKE 10
Content
• Research project XRAKE
• Introduction
• General ideas of XW
• Examples
• Implementation & Future
November 15, 2001 Research project XRAKE 11
Examples
• positional text data– phone invoices
• separator-delimited text
• binary data
INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK
John SmithGarden Avenue 4043234 Bigtown
PHONE SPECIFICATION
DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown
595324 17.8.1996 907.00
XW Wrapper Specification
<xw:wrapper xw:name=”phone-invoice” xw:sourcetype=”text” xmlns:xw=”http://www.cs.uku.fi/XW/2001”> <invoice xw:starter=”\^INVOICE” xw:occurs=”unbounded”> … </invoice></xw:wrapper>
INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK
John SmithGarden Avenue 4043234 Bigtown
PHONE SPECIFICATION
DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown
595324 17.8.1996 907.00
<xw:wrapper xw:name="phone-invoice" xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001" > <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata ...> ... </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" ...> ... </specification> <invoicedata xw:starter="\^----------" ...> ... </invoicedata> </invoice></xw:wrapper>
INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK
John SmithGarden Avenue 4043234 Bigtown
PHONE SPECIFICATION
DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown
595324 17.8.1996 907.00
<xw:wrapper xw:name="phone-invoice" xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001" > <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata xw:childterminator="\n" xw:ignoreemptysubpart="true"> <invoicenumber xw:position="53 64"/> <customernumber xw:position="60 71"/> <personalreference xw:position="60 71"/> <name xw:position="1 22"/> <streetaddress xw:position="1 22"/> <postoffice xw:position="1 22"/> </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" ...> ... </specification> <invoicedata xw:starter="\^----------" ...> ... </invoicedata> </invoice></xw:wrapper>
INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK
John SmithGarden Avenue 4043234 Bigtown
PHONE SPECIFICATION
DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown
595324 17.8.1996 907.00
<xw:wrapper ...> <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata xw:childterminator="\n" ...> </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" xw:childterminator="\n" xw:ignoreemptysubpart="true"> <xw:ignore/> <specificationrow xw:occurs="unbounded"> <date xw:position="1 12"/> <units xw:position="14 22"/> <duration xw:position="24 33"/> <number xw:position="35 43"/> <price xw:position="45 52"/> </specificationrow> </specification> <invoicedata xw:starter="\^----------" ... </invoicedata> </invoice></xw:wrapper>
INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK
John SmithGarden Avenue 4043234 Bigtown
PHONE SPECIFICATION
DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown
595324 17.8.1996 907.00
<xw:wrapper xw:name="phone-invoice" xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001" > <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata xw:childterminator="\n" xw:ignoreemptysubpart="true"> </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" xw:childterminator="\n" xw:ignoreemptysubpart="true"> </specification> <invoicedata xw:starter="\^----------" xw:childterminator="\n" xw:ignoreemptysubpart="true"> <xw:ignore xw:occurs="4"/> <reference xw:position="30 48"/> <xw:collapse> <duedate xw:position="30 39"/> <totalsum xw:position="42 50"/> </xw:collapse> </invoicedata> </invoice></xw:wrapper>
<invoice> <identifierdata> <invoicenumber>44196</invoicenumber> <customernumber>25272</customernumber> <personalreference>WORK</personalreference> <name>John Smith</name> <streetaddress>Garden Avenue 40</> <postoffice>43234 Bigtown</> </identifierdata> <specification> <specificationrow> <date>11.1.1992</date> <units>5</units> <duration>307 min</duration> <number>37126</number> <price>50.00</price> </specificationrow>
Resulting XML 1/2
<specificationrow> <date>23.6.1995</date> <units>10</units> <duration>193 min</duration> <number>53829</number> <price>122.00</price> </specificationrow> </specification> <invoicedata> <reference>595324</reference> <duedate>17.8.1996</duedate> <totalsum>907.00</totalsum> </invoicedata></invoice>
Resulting XML 2/2
November 15, 2001 Research project XRAKE 24
Examples
• positional text data
• separator-delimited text– HL7 version 2.3 messages
• binary data
MSH|^~\&|KL-Lab||CCIMS|RDNT01|200001071300||ORU^R01...PID|||311244A0112|ExamMod1|Smith^John||19441231|M...OBR||76551|Res_01||||20000107060000|||||||||||||||||CH|COBX||NM|1535^aB-pO2^||11||||||FNTE|||This is a comment for aB-pO2.NTE|||Another comment for aB-pO2.OBX||NM|1026^S -ALAT^||61|||*|||F
Research project XRAKE 26
<!-- MSH, PID and OBR lines processed above --> <xw:CHOICE xw:occurs='unbounded'> <xw:collapse xw:starter='\^OBX' xw:childseparator='|'> <xw:ignore xw:occurs='3'/> <observation/> <xw:ignore/> <result/> <xw:ignore xw:occurs='2'/> <flag/> <xw:ignore xw:occurs='2'/> <responsetype/> </xw:collapse> <xw:collapse xw:starter='\^NTE' xw:childseparator='|' xw:occurs='unbounded'> <xw:ignore xw:occurs='3'> <xw:collapse/> </xw:collapse>
</xw:CHOICE>
<!-- MSH, PID and OBR lines processed above --> <xw:CHOICE xw:occurs='unbounded'> <xw:collapse xw:starter='\^OBX' xw:childseparator='|'> <xw:ignore xw:occurs='3'/> <observation/> <xw:ignore/> <result/> <xw:ignore xw:occurs='2'/> <flag/> <xw:ignore xw:occurs='2'/> <responsetype/> </xw:collapse> <xw:ELEMENT xw:name='comment'> <xw:collapse xw:starter='\^NTE' xw:childseparator='|' xw:occurs='unbounded'> <xw:ignore xw:occurs='3'> <xw:collapse/> </xw:collapse> </xw:ELEMENT> </xw:CHOICE>
Resulting XML<response> ... <observation>1535^aB-pO2^</observation> <result>11</result> <responsetype>F</responsetype> <comment>This is a comment for aB-pO2.Another comment for aB-pO2.</comment> <observation>1026^S -ALAT^</observation> <result>61</result> <flag>*</flag> ...</response>
November 15, 2001 Research project XRAKE 28
Examples
• positional text data
• separator-delimited text
• binary data– packet of IP-based communications protocol
Binary data
length 16b 16b 16b 16b 4*8b 4*8b variesname len chk id off src dst paytype short short short short 4*byte 4*byte array of bytes
<xw:wrapper xw:name="IP-like-protocol" xw:sourcetype="binary" xmlns:xw="http://www.cs.uku.fi/XW/2001"> <datagram> <xw:ignore xw:name="total-length" xw:type="short"/> <checksum xw:type="short"/> <id xw:type="short"/> <segment-offset xw:type="short"/> ... </datagram></xw:wrapper>
Binary data
length 16b 16b 16b 16b 4*8b 4*8b variesname len chk id off src dst paytype short short short short 4*byte 4*byte array of bytes
<xw:wrapper xw:name="IP-like-protocol" xw:sourcetype="binary" xmlns:xw="http://www.cs.uku.fi/XW/2001"> <datagram> <xw:ignore xw:name="total-length" xw:type="short"/> <checksum xw:type="short"/> <id xw:type="short"/> <segment-offset xw:type="short"/> <xw:ELEMENT xw:name="source-address"> <a xw:type="byte"/> <b xw:type="byte"/> <c xw:type="byte"/> <d xw:type="byte"/> </xw:ELEMENT> <xw:ELEMENT xw:name="destination-address"> <a xw:type="byte"/> <b xw:type="byte"/> <c xw:type="byte"/> <d xw:type="byte"/> </xw:ELEMENT> <xw:ELEMENT name="payload"> <xw:collapse xw:occurs="total-length - 16" xw:type="byte" xw:numeric-output-format="hexadecimal"/> </xw:ELEMENT> </datagram></xw:wrapper>
Resulting XML
<datagram> <checksum>397485</checksum> <id>37</id> <segment-offset>0</segment-offset> <source-address> <a>193</a><b>167</b><c>232</c><d>253</d> </source-address> <destination-address> <a>193</a><b>167</b><c>224</c><d>8</d> </destination-address> <payload>e6a9ff120a</payload></datagram>
November 15, 2001 Research project XRAKE 34
Content
• Research project XRAKE
• Introduction
• General ideas of XW
• Examples
• Implementation & Future
November 15, 2001 Research project XRAKE 35
Implementation & Future
• Java program– reads wrapper specification into DOM tree– produces output as SAX events:
characters, startElement, endElement
• further development of XW– attribute generation– content generation from input– enhancements to alternative/optional parts