Introduction to XML Based on tutorials of B. Cormia, D. Suciu, H. Boley, S. Decker, M. Sintek, E. R....

Post on 21-Dec-2015

222 views 2 download

Tags:

Transcript of Introduction to XML Based on tutorials of B. Cormia, D. Suciu, H. Boley, S. Decker, M. Sintek, E. R....

Introduction to XML

Based on tutorials of B. Cormia, D. Suciu, H. Boley, S. Decker, M. Sintek, E. R. Harold and others

Tools

Tools

WebWebServicesServices

Integration & Integration & InteroperabilityInteroperability

Data

(X

ML)

Data

(X

ML)

Such Format, which Describes the Content of a Web Document Rather than the Way to Display it, is among the Basic Needs of the Intelligent Web Applications

Introduction

XML is a text-based markup language that is fast becoming the standard for data interchange on the Web.

As HTML, XML uses tags. But unlike HTML, XML tags identify the data, rather than

specifying how to display it. Where an HTML tag says something like "display this data

in bold font" (<b>...</b>), an XML tag acts like a field name in your program. It puts a label on a piece of data that identifies it (for example: <message>...</message>).

HTML vs. XML<h1> Bibliography </h1>

<p> <i> Foundations of DBs</i>, Abiteboul, Hull, Vianu

<br> Addison-Wesley, 1995

<p> <i> Logics for DBs and ISs </i>, Chomicki, Saake, eds.

<br> Kluwer, 1998

<biobliography>

<book> <title> Foundations of DBs </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison-Wesley </publisher>

....

.</book>

<book> ... <editor> Chomicki </editor>... </book> ...

</bibliography>

HTML tags: presentation,

generic document structure

XML tags: content,

"semantic", (DTD-) specific

External Presentations from XML

<address> <name>Xaver M. Linde</name> <street>Wikingerufer 7</street> <town>10555 Berlin</town></address>

XML Markup:

Xaver M. LindeWikingerufer 710555 Berlin

External Presentations:

XML stylesheets are,e.g., usable to generatedifferent presentations

Xaver M. LindeWikingerufer 710555 Berlin. . .

<address> <name>Xaver M. Linde</name> <place> <street>Wikingerufer 7</street> <town>10555 Berlin</town> </place></address>

XML to XML Transformations

<address> <name>Xaver M. Linde</name> <street>Wikingerufer 7</street> <town>10555 Berlin</town></address>

XML Markup 1:

XML Markup 2:XML stylesheets arealso usable to transformXML representations

<address> <name>Xaver M. Linde</name> <street>Wikingerufer 7</street> <town>10555 Berlin</town></address>

WHERE <address> <name>Xaver M. Linde</name> <street>$s</street> <town>$t</town> </address>CONSTRUCT <binding> <s>$s</s> <t>$t</t> </binding>

XML Queries

XML Markup:

XML Query (XML-QL):

XML queries canselect subelementsof XML elements

element

ssubelements

<binding> <s>Wikingerufer 7</s> <t>10555 Berlin</t> </binding>

PART_OF and HAS_PART Example

HAS_PART

PART_OFkitchen

flat

kitchenflatHAS_PART

PART_OF

XML

<FLAT> kitchen </FLAT>

Role of an Object

to be place for making food in kitchen

flat

kitchenflatto be place for making food in

XML

<FLAT>

<PLACE FOR MAKING FOOD> kitchen </PLACE FOR MAKING FOOD>

</FLAT>

Multi-Roles Object

Vagan

University of Kharkov

to be Head of Department

to be a lecturer

to be Head of Research Lab.

to be Member of Council

to be Head of Exchange Programs

XML (1)

<UNIVERSITY OF KHARKOV>

<MEMBER OF COUNCIL> Vagan </MEMBER OF COUNCIL>

<HEAD OF EXCHANGE PROGRAM> Vagan </HEAD OF EXCHANGE PROGRAM>

<HEAD OF DEPARTMENT> Vagan </HEAD OF DEPARTMENT>

<HEAD OF RESEARCH LAB> Vagan </ HEAD OF RESEARCH LAB>

<LECTURER> Vagan </LECTURER>

</UNIVERSITY OF KHARKOV>

XML (2)

<VAGAN>

<ROLES IN UNIVERSITY OF KHARKOV>

<ROLE 1> Member of Council </ROLE 1>

<ROLE 2> Head of Exchange Program </ROLE 2>

<ROLE 3> Head of Department </ROLE 3>

<ROLE 4> Head of Research Lab </ROLE 4>

<ROLE 5> Lecturer </ROLE 5>

</ROLES IN UNIVERSITY OF KHARKOV>

</VAGAN>

XML (3)

<UNIVERSITY OF KHARKOV>

<ROLE type = “MEMBER OF COUNCIL”> Vagan </ROLE>

<ROLE type = “HEAD OF EXCHANGE PROGRAM”> Vagan </ROLE>

<ROLE type = “HEAD OF DEPARTMENT”> Vagan </ROLE>

<ROLE type = “HEAD OF RESEARCH LAB”> Vagan </ROLE>

<ROLE type = “LECTURER”> Vagan </ROLE>

</UNIVERSITY OF KHARKOV>

Multi-Contextual Role of Object

University of Jyvaskyla

Vagan

University of Kharkov

to be Head of AI Departmentto be a lecturer

XML (1)

<VAGAN>

<ROLE IN UNIVERSITY OF KHARKOV>

Head of Department

</ROLE IN UNIVERSITY OF KHARKOV>

<ROLE IN UNIVERSITY OF JYVASKYLA>

Lecturer

</ROLE IN UNIVERSITY OF JYVASKYLA >

</VAGAN>

XML (2)

<VAGAN>

<ROLE place = “UNIVERSITY OF KHARKOV”> Head of Department </ROLE>

<ROLE place = “UNIVERSITY OF JYVASKYLA” > Lecturer </ROLE>

</VAGAN>

Multilevel Context Roles

Vagan

Kharkov University

AI Department

Ukraine... citizen

... employer

... Head

XML<COUNTRY>

<NAME> Ukraine </NAME>

<LEADING UNIVERSITY>

<NAME> Kharkov University </NAME>

<BEST DEPARTMENT>

<NAME> AI Department </NAME>

<HEAD>

<NAME> Vagan </NAME>

</HEAD>

</BEST DEPARTMENT>

</LEADING UNIVERSITY>

</COUNTRY>

Not enough

Contents

XML SpecificationDocument Type DefinitionsCascading Style SheetsQuerying XML

XML Specification

Elements, Attributes, and Values

XML uses the same building blocks as HTML, elements, attributes, and values

Elements contain attributesAttributes contain valuesValues contained in quotations (“ ”)

Simple XML Element (no attributes)

<position>professor</position>

name of the element

opening tag closing tag

name of the element

content of the element

Simple XML Element

<position>professor

</position>

<position>professor</position>is equivalent to

<diagnosis>professor

</diagnosis>

is different with

XML Element with Attribute

<position place = “university”>professor</position>

name ofthe element

opening tag closing tag

name ofthe element

content of the element

attribute ofthe element

value ofthe attribute

XML Element with Two Attributes

<position place = “university” type = “teaching”>

professor

</position>

XML Element with Two Attributes

<position place = “university” type = “teaching”>

professor

</position>

<position>

<name>professor</name>

<place>university</place>

<type>teaching</type>

</position>

is similar but not equivalent to

Do Not Forget to Put Quotations

… place = “university”...

quotations are obligatory

around the value of an attribute

Nominal vs. Numerical Attributes

<price currency = “Euro”>

49.90

</price>

<constant value = “3.14”>

</constant >

Empty Element

<position/>

name of the element

opening and closing tags are merged together

<position></position>

is equivalent to

Empty Element with Attribute

<picture location = “/images/blueball.gif”/>

<picture location = “/images/blueball.gif”></picture>

is equivalent to

Tags Must be Nested Correctly

<department>

<head>

vagan

</head>

</department>

<department>

<head>

vagan

</department>

</head>

Case Matters

<department>

Artificial Intelligence

</department>

<Department>

Artificial Intelligence

</Department>

is not the same as

<Department>

Artificial Intelligence

</department>

A Root Element is Required

<CS_Faculty>

<department>

Artificial Intelligence

</department>

<department>

Information Systems

</department>

</CS_Faculty>

<department>

Artificial Intelligence

</department>

<department>

Information Systems

</department>

Writing five special symbols

To write the five special symbols:Type &amp for ampersand (&)Type &lt for the less than sign (<)Type &gt for the greater than sign (>)Type &quot to create a double quote (“)Type &apos to create an apostrophe (‘)

Declaring the XML Version

At the very beginning of the document type: <?xml

Then type: version=“1.0”Type: ?>

<?xml version=“1.0” ?>

Declaring the XML VersionObligatory and Optional Attributes

<?xml version=“number”

[encoding=“encoding”]

[standalone=“yes|no”] ?>

optionaloptional

obligatory

Encoding Attribute

<?xml version=“1.0” encoding=“US-ASCII” ?>

Encoding Attribute ValuesUS-ASCIIUS-ASCII is a 7-bit encoding scheme that covers the English-language alphabet.

UTF-8UTF-8 is an 8-bit encoding scheme. Characters from the English-language alphabet are all encoded using an 8-bit bytes. Characters for other languages are encoding using 2, 3 or even 4 bytes. UTF-8 therefore produces compact documents for the English language, but very large documents for other languages.

UTF-16UTF-16 is a 16-bit encoding scheme. It is large enough to encode all the characters from all the alphabets in the world, with the exception of ideogram-based languages like Chinese. All characters in UTF-16 are encoded using 2 bytes. An English-language document that uses UTF-16 will be twice as large as the same document encoded using UTF-8. Documents written in other languages, however, will be far smaller using UTF-16.

Standalone Attribute

<?xml version=“1.0” standalone=“no” ?>

<?xml version=“1.0” standalone=“yes” ?>

An outside DTD is needed to correctly interpret the XML document

An outside DTD is not needed

DTD (Document Type Definition) is a file which describes the elements and attributes that may appear in the XML document and used to check its syntactical structure

The optional standalone attribute in XML declaration specifies whether a DTD is required to parse the document. The value must be “yes” or “no”.

Writing comments

To write comments:Type <!--Write the desired commentsType -->

<!-- This is a comment -->

Namespaces

Namespaces are a recent addition to the XML specification. The use of namespaces is not mandatory in XML, but it's often wise.

Namespaces were created to ensure uniqueness among XML elements.

<CS_Faculty xmlns = ‘http://www.academic.com’>

</CS_Faculty>

Namespaces

element Value - namespace identifier (URL)Attribute - XML

namespace

area of validity of the namespace

Namespace Prefix

<stock xmlns:edi='http://ecommerce.org/schema'> <!-- the 'price' element's namespace is http://ecommerce.org/schema -->

< edi :price units='Euro'>32.18</edi:price> ... </ stock >

Namespace prefix

Document Type Definitions

Document type definitions

A DTD specifies how elements inside an XML document should relate to each other

It also provides grammar rules for the document and each of the elements

A document that fits to the XML specifications and rules outlined by its DTD is considered to be “valid”

(Not to be confused with a well-formed document, which adheres to XML syntax rules

Declaring DTD in XML Document

<!DOCTYPE CS_Faculty SYSTEM “faculty.dtd”>

keywordfile with DTDroot element

in XML file

Denotes that DTD resides in a separate local file

Declaring DTD in XML Document<?xml version=“1.0” standalone=“no” ?>

<!DOCTYPE CS_Faculty SYSTEM “faculty.dtd”>

<!-- Here begins the XML data -->

<CS_Faculty> <department> Artificial Intelligence </department> <department> Information Systems </department></CS_Faculty>

Declaring an internal DTD

At the top of the XML document, after the XML declaration, type:

<! DOCTYPE root[where root corresponds to the name of the root element in the document that the DTD will be applied to.

Type: ]> to complete the DTD.

Example code

<? XML version=“1.0” ?>

<!DOCTYPE CS_Faculty [

]>

Leave room between [ and ] for document type definitions.

Declaring a personal external DTD

In the XML declaration at the top of the document, add standalone =“no”

Type <!DOCTYPE root (name of root element)

Type SYSTEM to indicate that the external DTD is a personal, non-standardized DTD

Type file.dtd, where “file.dtd” is the DTD file

Type > to complete the document type declaration

Writing a personal external DTDThus use declaration like:

Create a new text file faculty.dtd with a text editorDefine the rules for the DTD (document type

definitions for defining elements and attributes, and entities and notations)

Save the file as text only with the .dtd extension

<!DOCTYPE CS_Faculty SYSTEM “faculty.dtd”>

<?xml version=“1.0” standalone=“no” ?>

Naming an external DTDType:

+ if DTD is approved by a standards body

- if DTD is not a recognized standard

Type:

// Owner//DTD where owner identifies who wrote or maintains the DTD

Type a space followed by a label for the DTD, then //XX// where XX defines the language

Naming an external DTD (example)

- //Vagan Terziyan//DTD Faculties//EN//

Vagan Terziyan is the owner

Faculties is the DTD description

EN means the DTD is written in English

Declaring a public external DTDIn the XML declaration at the top of the document, add

standalone =“no”

Type <!DOCTYPE root (name of root element)

Type PUBLIC to indicate that the external DTD is a standardized set of rules

Type “DTD_name” where DTD_name is the official name of the DTD you are referencing

Type file.dtd, where “file.dtd” is the DTD file

Type > to complete the document type declaration

Example code

<?xml version =”1.0” standalone = “no”?>

<!DOCTYPE CS_Faculty PUBLIC

“- //Vagan Terziyan//DTD Faculties//EN//”

“http://www.ac.com/XML/examples/faculty.dtd”>

Defining elements and attributes in a DTD

Type <!ELEMENT tagType name of the elementType EMPTY if no contentsSpecify contentsType (ANY) to allow any combination of

elements or text

TYPE > to complete the element declaration

Defining an element to contain only text

Type: <!ELEMENTType: name of the element

Next type: (#PCDATA)Finally type: >

<!ELEMENT faculty (#PCDATA)>

Defining an element to contain one child

Type: <!ELEMENTType: name of the element

Next type: (child of the element)Finally type: >

<!ELEMENT faculty (department)>

Defining an element to contain a sequence

Type: <!ELEMENTType: name of the element

Type: (child1, child2 ,…, childn of the element)Type: >

<!ELEMENT faculty

(deans_office, department)>

Defining choices

Type: <!ELEMENTType: name of the element

Type: (child1 | child2 | … | childn of the element)Type: >

<!ELEMENT faculty

(deans_office, (department | research_lab))>

Defining how many units

To define how many units

Type ? To indicate that the unit can appear at most once, if at all (zero or one)

Type + to indicate that the unit must appear at least once (one or more)

Or type * to indicate that the unit can appear as many times as necessary, or not at all (zero or more)

Defining how many units

<!ELEMENT faculty (deans_office, financial_office*, library?, (department+ | research_lab+))>

About attributes

Attributes add information about an element

Information contained in attributes tends to be about the content of the page

Elements are perhaps better for information you want to display attributes for information about information

Defining simple attributesType <!ATTLISTType elementType attributeType CDATA

Or type (choice_1 | choice_2)Type DEFAULT

or type #REQUIRED or type #IMPLIED

Type >attribute must be explicitly provided

attribute is optional

Defining attributes example

<!ELEMENT slideshow (slide+)>

<!ATTLIST slideshow

title CDATA #REQUIRED

date CDATA #IMPLIED

author CDATA #REQUIRED

language (English | German) # IMPLIED

>

<!ELEMENT slide (title, item*)>

Creating shortcuts for textPredefined Entities

In the DTD type <!ENTITYType abbreviationType “content”

Type >

Using shortcuts for text

In the XML document type: &Type: abbreviation

where abbreviation is the identifying name of your

entity (and matches the one used in the previous

example)

Type: ;

Example<!ENTITY product "WonderWidget"><!ENTITY products "WonderWidgets">

<slideshow title="WonderWidget&product; Slide Show" ...<!-- TITLE SLIDE --> <slide type="all"> <title>Wake up to WonderWidgets&products;!</title> </slide><!-- OVERVIEW --> <slide type="all"> <title>Overview</title> <item>Why <em>WonderWidgets&products;</em> are great</item> <item/> <item>Who <em>buys</em> WonderWidgets&products;</item> </slide>

Cascading Style Sheets

CSS

CSS was made to format XML documents for presentation

External Style sheets global control of presentation

The Anatomy of Style

A style is made up of a selector and one or more declarations

Declarations determine how the chosen elements will be displayed

A selector can be as simple as an element name

Declarations have a property and a value: color:red or font:bold 12pt Tekton

Creating an External Style SheetTo create a style sheet:Create a text documentType name of selector for elementsType { to begin the properties that

should be appliedDefine as many properties as desiredType } to mark the end of the rule

Sample CSS code

name {display:block; position:absolute}intro {display:block; border:medium

dotted red; padding:5; margin-top:5}picture {display:block}population {display:inline}latin_name {display:inline}more_info {display:inline}

Calling a Style Sheet for an XML Document

To create the processing instruction manually:At the top of the document, after the initial XML

declaration, type:

<?xml-stylesheet type=“text/css”Then type: href=“style.css”Finally, type: ?> to complete the processing

instruction

<?xml-stylesheet type=“text/css” href=“style.css” ?>

Setting the Text Color

To set the text color:Type color:Type colorname, where colorname is one of

16 predefined colorsOr type #rrggbb, or rgb (r,g,b) where each

can be a value from 0-255Or rgb (%r,%g,%b) where r, g, b, specify the

percentage of red, green, or blue.

Aligning Text

You can set up certain HTML tags to always be aligned to the right, left, center, or justified, as desired.

To align text:Type left to align text to the leftType right to align text to the rightType center to center the text in the middle of the

screenType justify to align the text on both the right and left

Underlining Text

To underline text:Type text-decoration:To underline text type underlineFor a line above the text, type overlineTo strike out text, type line-throughTo get rid of underlining, overlining.

Etc., type text-decoration:none

Querying XML

A Query Language for XML: XML-QL

Designed in AT&T Labs (w. Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy)

Implementation on top of Strudel (Alin Deutsch, Mary Fernandez)

Prototype: http://www.research.att.com/sw/tools/xmlql

XML-QL Example Data (bib.xml)

<bib> <book year=“1995> <title> An Introduction to DB Systems </title>

<author> <lastname> Date </lastname></author><publisher><name> Addison-Wesley</name> </publisher>

</book><book year=“1995>

<title> Foundations for OR Databases </title><author> <lastname> Date </lastname></author> <author> <lastname> Darwen </lastname></author> <publisher><name> Addison-Wesley</name> </publisher>

</book></bib>

XML-QL Example Data (bib.dtd)

<!ELEMENT book (author+, title, publisher)>

<!ATTLIST book year CDATA>

<!ELEMENT article (author+, title, year?, (shortversion|longversion))>

<!ATTLIST article type CDATA>

<!ELEMENT publisher (name, address)>

<!ELEMENT author (firstname?, lastname)>

Query ExampleFind all the names of the authors whose publisher is

Addison-Wesley:

WHERE <book>

<publisher><name> Addison-Wesley </name></publisher>

<title> $t </title>

<author> $a </author>

</book> IN "www.a.b.c/bib.xml"

CONSTRUCT $a

Query Example (syntax)The use of </> instead of </XXX>:

WHERE <book>

<publisher><name> Addison-Wesley </></>

<title> $t </>

<author> $a </>

</> IN "www.a.b.c/bib.xml"

CONSTRUCT $a

Result of the query:

The output is in XML form:

<lastname> Date </lastname>

<lastname> Darwen </lastname>

<lastname> Date </lastname>

XML-QL: Pattern Matching and Selections

WHERE <book><publisher>Springer</publisher>

<author> $a </author>

<year> $y </year>

</book> IN "www.a.b.c/bib.xml”,

1991 <= $y AND $y <= 1994

CONSTRUCT $a

XML-QLConstruction of New XML Data

WHERE <book> <publisher> Springer </>

<title> $t </>

<author> $a </>

</> IN "www.a.b.c/bib.xml"

CONSTRUCT <result> <author> $a </>

<title> $t </>

</>

Constructing new XML data: (result)

<result>

<author> <lastname> Date </lastname> </author>

<title> An Introduction to DB Systems </title>

</result>

<result>

<author> <lastname> Date </lastname> </author>

<title> Foundation for OR Databases</title>

</result>

<result>

<author> <lastname> Darwen </lastname> </author>

<title> Foundation for Object/Relational Databases: The Third Manifesto </title>

</result>

XML-QL Semantics

Step 1: find all substitutionsStep 2: construct XML result

WHERE $X..$Y..$Z

CONSTRUCT$ X $ Y $ Z

Conclusions

XML is for structuring dataXML looks a bit like HTMLXML is text, but isn't meant to be readXML is a family of technologiesXML is modularXML is the basis for RDF and the Semantic WebXML is license-free, platform-independent and well-

supported

Web Referenceshttp://www.xml.com/http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/

http://wdvl.com/http://www.xml.org/http://www.w3.org/http://www.microsoft/XML/http://www.ibm/alphaworks/http://www.arbortext.com/