1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue...

43
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake

Transcript of 1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue...

1

XML at a neighborhood

university near you

Innovation 2005September 16, 2005

Kwok-Bun YueUniversity of Houston-Clear Lake

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 2

Content

• What is XML?

• XML Modeling

• XML Parsing

• XML Transformation

• XML and Databases

• XML at UHCL

• Conclusions

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 3

What is XML?

• XML stands for eXtensible Markup Language.• XML is a system for defining, validating, and

sharing document formats.• Standard organization: World Wide Web

Consortium (W3C): http://www.w3.org/XML/.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 4

XML Basic Constructs

• XML uses tag elements and attributes to describe document structures and properties.

• Unlike HTML, XML is extensible.

• Authors can use XML to define a new language for a given application.

• XML is a meta-language.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 5

A Simple XML Example

<?xml version="1.0"?>

<memo priority="very high">

<from>Bun Yue</from>

<to>Everybody</to>

<body>

Hello, welcome!

</body>

</memo>

XML Version mustbe in the first line.

XML contents

Root element

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 6

Why XML?

• XML captures semantic well.

• Simple.

• Text.

• Standard.

• Wide support.

• Validation.

• Abundance of tools.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 7

Some Disadvantages

• Verbose

• Text

• Ordered tree model may not fit best

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 8

Some UHCL XML Applications

• Web Services: SOAP, UDDI, WSDL, etc.

• XHTML

• VoiceXML

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 9

Some UHCL XML Applications

• Wireless Markup Language

(WML)

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 10

Some UHCL XML Applications

• Scalar Vector Graphics

(SVG)

A triangle fractal

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 11

Content

• What is XML?

• XML Modeling

• XML Parsing

• XML Transformation

• XML and Databases

• XML at UHCL

• Conclusions

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 12

XML Modeling

• Devise an XML vocabulary to capture an application.

• May use available modeling tools and languages, such as UML.

• XML basically uses an ordered tree model.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 13

XML Tree Model

XML file:

<?xml version="1.0"?>

<a att="123"> <b>Hi</b> <b>There</b> <c>Bye</c></a>

DocumentRoot

aProlog

cbb

An XML tree showing element nodes only

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 14

Syntax & Validation

• All XML document must be well-formed: satisfying basic syntax.

• XML documents may be validated by various schemas.

• Validation:– Cost: time and effort.– Benefit: increased reliability.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 15

XML Validation Languages

• XML Validation Languages– Document Type Definition (DTD)– XML Schema– Schematron– Relax NG

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 16

Document Type Definition (DTD)

• A grammar to determine the validity of an XML document.

• An XML document satisfying the rules of a DTD is said to be validated.

• DTD is part of the XML language standard.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 17

A Simple DTD

<!ELEMENT persons (person+)>

<!ELEMENT person (name, pet+)>

<!ATTLIST person

id ID #REQUIRED

spouse IDREF #IMPLIED>

<!ELEMENT name (#PCDATA)>

<!ELEMENT pet (#PCDATA)>

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 18

DTD Validation

• An XML document validated by the DTD:<persons><person id="p12324" spouse="p10001"> <name>Adam</name> <pet>Eva</pet></person><person id="p10001"> <name>John</name></person></persons>

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 19

Not Validated

<persons><person id="p12324" spouse="p10010"> <pet>Eva</pet> <name>Adam</name></person><person> <name>John</name> <name>Jack</name></person></persons>

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 20

Limitations of DTD

• Schema languages have limited expressive power.

• DTD is simple and not expressive.

• Others are more expressive (and complicated): e.g. XML Schema.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 21

Content

• What is XML?

• XML Modeling

• XML Parsing

• XML Transformation

• XML and Databases

• XML at UHCL

• Conclusions

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 22

XML Parsing

• A large collection of XML Parsers in various languages: Java, Perl, C#,…

• Two popular classes:– DOM (Document Object Model): Build an

XML tree.– SAX (Simple API for XML): Event driven

(push).

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 23

SAX

• XML Input is converted to a sequence of events (e.g. startElement, endElement, characters, …)

• Programmers define event handlers to handle the events.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 24

SAX Example

// Java

public void startElement(String namespaceURI,    String lName, // local name    String qName, // qualified name    Attributes attrs)      throws SAXException {numElements++;

// numElements is a data member.

 } // startElement

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 25

DOM

• DOM (Document Object Model): a W3C standard.

• A “platform- and language-neutral interface” to present documents.

• DOM parser parses an XML document and build an XML tree.

• DOM classes can then be used to access and manipulate the tree.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 26

DOM Example

try { // Java   //  Parse input XML file. DocumentBuilderFactory factory =         DocumentBuilderFactory.newInstance()

  DocumentBuilder builder = factory.newDocumentBuilder();  document = builder.parse(new File(argv[0]));

   System.out.println(“Name of root element of " + argv[0] +  " = " +

document.getDocumentElement().getLocalName()))} …

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 27

Content

• What is XML?

• XML Modeling

• XML Parsing

• XML Transformation

• XML and Databases

• XML at UHCL

• Conclusions

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 28

XML Transformation

• Transformation from XML to XML and other formats.

• Can use XML parsers.

• XSLT: XML Stylesheet Language/Transform.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 29

XSLT

• Rule-based language for XML transformation.

• Contains a set of templates (rules) for identifying components to be acted on.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 30

XSLT Template

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 31

XSLT Example

• An XSLT template to replace an <a> element by <b>, preserving its content:

<xsl:template match=‘a' >  <b><xsl:value-of select=“." /></b></xsl:template>

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 32

Content

• What is XML?

• XML Modeling

• XML Parsing

• XML Transformation

• XML and Databases

• XML at UHCL

• Conclusions

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 33

Storage of XML

• XML can be stored as files or in database.

• Leading databases support XML storage:– Native XML Database– XML Enhanced Database

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 34

XQuery

• W3C Standard

• For effectively querying and retrieving information from a diversified XML sources.

• Similar to SQL for relational database.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 35

XQuery Example

<figures>{for $f in doc(“diagrams.xml")//figurereturn

<title>{ $f/title }</title>

}</figures>

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 36

Content

• What is XML?

• XML Modeling

• XML Parsing

• XML Transformation

• XML and Databases

• XML at UHCL

• Conclusions

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 37

XML Related Courses

• CSCI 4230 Internet Application Development: started covering XML in Spring 2000.– Example project: using MS XML parser,

parse XML weather information from an external site, retrieve its information, and present it in a specific HTML format.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 38

XML Related Courses

• CSCI 5733 XML Application Development: started in Spring 2002.

• Complete coverage of details of this presentation + much more.

• Programming assignments in XML parsers, XSLT, XPath, WML, SVG, etc.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 39

Capstone Projects

• Graduate capstone projects from external companies.

• Some XML project examples:– SVG– XML difference engine– XML based workflow

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 40

XML Research at UHCL

• Some examples:– Effective storage of XML in relational

database.– Mapping of DTD to relational schema.– Software metrics for XML Schema.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 41

Content

• What is XML?

• XML Modeling

• XML Parsing

• XML Transformation

• XML and Databases

• XML at UHCL

• Conclusions

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 42

Conclusions

• XML: wide potential for applications and research.

• UHCL is an early adopter.

• Many UCHL students are trained in XML.

9/10/2005 Bun Yue: [email protected], http://dcm.uhcl.edu/yue slide 43

Questions?

• Any Questions?

• Thanks!