SNU OOPSLA Lab. XML Applications The ubiquitous XML(5) © copyright 2001 SNU OOPSLA Lab.
XML SNU OOPSLA Lab. October 2005 2 Contents Semistructured Data Introduction History XML...
-
Upload
kenneth-austin -
Category
Documents
-
view
235 -
download
0
Transcript of XML SNU OOPSLA Lab. October 2005 2 Contents Semistructured Data Introduction History XML...
XML
SNU OOPSLA Lab.October 2005
2
Contents
Semistructured Data Introduction History XML Application DTD & XML Schema DOM & SAX Summary Online Resources
3
Semistructured Data(1/3)
Semistructured Data and XML Integration of heterogeneous sources Data sources with non-rigid structure
Biological data Web data
Characteristics of Semistructured Data Missing or additional attributes Multiple attributes Different types in different objects
self-describing, irregular data, no a priori structure
4
Semistructured Data(2/3)
&o1
&o12 &o24 &o29
&o43&96
&243 &206
&25
“Serge”“Abiteboul”
1997
“Victor”“Vianu”
122 133
paperbook
paper
references
referencesreferences
authortitle
yearhttp
author
authorauthor
titlepublisherauthor
authortitle
page
firstnamelastname
firstname lastname firstlast
Bib
Object Exchange Model (OEM)
complex object
atomic object
Data Model
5
Semistructured Data(3/3)
Bib: &o1 { paper: &o12 { … }, book: &o24 { … }, paper: &o29 { author: &o52 “Abiteboul”, author: &o96 { firstname: &243 “Victor”, lastname: &o206
“Vianu”}, title: &o93 “Regular path queries with
constraints”, references: &o12, references: &o24, pages: &o25 { first: &o64 122, last: &o92
133} } }
Syntax for Semistructured Data
6
Introduction(1/4)
XML An acronym for ‘eXtensible Markup Language’ A meta-language that describes other
languages A data format for storing structured and semi-
structured text for dissemination and ultimate publication, perhaps on a variety of media
7
Introduction(2/4)
Properties Tags enclose identifiable parts of the
document Self-describing Physical/logical structure
Physical structure : allows components of the document, called entities
Logical structure : allows a document to be divided into named units and sub-units, called elements
8
Introduction(3/4)
Sub-unit
Unit
Document
elements
Logical Structure
entities(internal)(separate)
Physical Structure
9
Introduction(4/4)
<warning><para> This substance if hazardous to health </para><para> See procedure 12A. 7 for information on protective clothing required. </para><logo …/></warning>
XML markup
<transaction><time date=“19980509”/><amount>123</amount><currency type=“pounds”/><from id=“x98765”> J. Smith</from><to id=“x56565>M. Jones</to></transaction>
XML document
10
History(1/2)
GM Internet
WWW
SGML
HTML
XML
1960
1986
1992
1997
GM = Generalized Markup
11
History(2/2)
1960’s, IBM GML(Generalized Markup Language)
1980’s, ISO 8879, SGML(Standard Generalized Markup
Language) Early 1990’s, HTML(HyperText Markup
Language) 1996, W3C’s XML 1998, XML 1.0 1999, RDF(Resource Description Framework)
12
Application
XML
DTD
DBMSParse
r
SA
XEven
ts
XS
L
Pro
cessor
ASP, Ja
va,
VB
HTMLBrowser
DOM(Document Object Model)SAX(Simple APIs for XML)XSL(eXtensible Stylesheet Language)ASP(Active Server Page)
Tre
eD
OM
DO
M A
PI
Data exchange applications
13
An XML Document
<?xml version=“1.0”?><!DOCTYPE sigmodRecord SYSTEM sigmodRecord.dtd”><sigmodRecord><issue> <volume>1</volume> <number>1</number> <articles><articles> <title> XML Research Issues</title> <initPage> 1 </initPage> <endPage> 5 </endPage> <authors> <author AuthorPosition=“00”> Tom Hanks </author> … </authors></article></articles></issue></sigmodRecord>
14
DTD(1/2)
DTD(Document Type Definition) An optional but powerful feature of XML Comprises a set of declarations that define a
document structure tree Some XML processors read the DTD and use it
to build the document model in memory A parser uses it to check the validity of
documents
15
DTD(2/2)
DTD define Element type + Attribute + Entities
Valid Vs. Invalid Valid conforms to DTD Invalid fail to conform to DTDWell formed
XML Document
Valid XML Document
16
XML Schema
Schema W3C standard : specifies structure of XML
documents Data types for elements/attributes
String, int, float Unordered set is also allowed Derivation of types are allowed
Replaces DTDs Removes syntactic distinctions between DTD
and XML Richer types compared to DTD
17
XML Schema Example
<xsd:element name=“article” minOccurs=“0” maxOccurs=“unbounded”>
<xsd:complexType><xsd:sequence> <xsd:element name=“title” type=“xsd:string”/> <xsd:element name=“initPage” type=“xsd:string”/> <xsd:element name=“endPage” type=“xsd:string”/> <xsd:element name=“author” type=“xsd:string”/> </xsd:sequence></xsd:complexType><xsd:element>
DTD<!ELEMENT article (title,initPage,endPage,author)><!ELEMENT title (#PCDATA)><!ELEMENT initPage (#PCDATA)><!ELEMENT endPage (#PCDATA)><!ELEMENT author (#PCDATA)>
18
DOM(1/2)
Characteristics Hierarchical (tree) object model for XML
documents Associate list of children with every node Preserves the sequence of the elements in the
XML documents sigmodRecord
issue
volume number articles
title initPage endPageXML document
19
DOM(2/2)
DOM interfaces Node : The base data type of the DOM. Element : The vast majority of the objects
you’ll deal with are Elements. Attr : Represents an attribute of an element. Text : The actual content of an Element or
Attr. Document : Represents the entire XML
document
20
SAX(1/2)
DOM : expensive to materialize for a large XML collection
Characteristics Event-driven : fire an event for every open
tag/end tag Does not require full parsing Enables custom object model building
Application
Parser
Document Handler
create
give
startDocument()
startElement()
characters()
endElement()
endDocument()
<!……………>
<->………….</->
parsing
FeedbackWhen event driven
Event driven
21
SAX(2/2)
The SAX API actually defines four interfaces for handling events EntityHandler TDHandler DocumentHandler ErrorHandler
All of these interfaces are implemented by HandlerBase.
22
DOM vs SAX(1/3)
Why use DOM? Need to know a lot about the
structure of a document Need to move parts of the
document around Need to use the information
in the document more than once
Why use SAX? Only need to extract a few
elements from an XML document
23
DOM vs SAX(2/3)
<book id="1"><verse> Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. Many a brave soul did it send hurrying down to Hades, and many a hero did it yield a prey to dogs and vultures, for so were the counsels of Jove fulfilled from the day on which the son of Atreus, king of men, and great Achilles, first fell out with one another.</verse><verse> And which of the gods was it that set them on to quarrel? It was the son of Jove and Leto; for he was angry with the king and sent a pestilence upon ...
SAX API would be much more efficientDoing this with the DOM would take a lot of memory
24
DOM vs SAX(3/3)
...<address>
<name> <title>Mrs.</title> <first-name>Mary</first-name> <last-name>McGoon</last-name></name><street>1401 Main Street</street><city>Anytown</city><state>NC</state><zip>34829</zip>
</address>
<address>
<name>
...
If we were parsing an XML document containing 10,000 address, and we wanted to sort them by last name??DOM would automatically store all of the dataWe could use DOM functions to move the nodes n the DOM tree
25
Summary
XML eXtensible Markup Language A data format for storing structured and semi-
structured text physical/logical structure
DTD& XML Schema Establishes formal document structure rules
DOM & SAX API Need to know a lot about the structure of a document need to extract a few elements from an XML document
26
Online Resources
XML tutorial http://www.xml.com http://www.w3c.org http://www.w3schools.com/ http://www.xmltraining.com/course-search-xml
+online+tutorials http://xmlfiles.com/