Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

44
Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Page 1: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Sistemi basati su conoscenzaXML

Prof. M.T. PAZIENZA

a.a. 2004-2005

Page 2: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Introduction to XML

HTML (1990) was designed to display data (documents), and to focus on how data looks

XML (1996) was designed to describe data (documents), and to focus on what data is

HTML is about displaying information,XML is about describing informationboth derive from SGML (1988)

XML is a standard for describing content in addition to presentation aspects.

Page 3: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

HTMLHTML is a markup language: it augments regular

text with “marks” that hold special meaning for Web browser handling the document.

Commands in the language are called tags (start – end),

<TAG>, </TAG>

and have a fixed meaning.

HTML is adequate to represent the structure of documents only for display purposes.

Page 4: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML (EXtensible Markup Language)

XML tags are not predefined in XML. The author must define his own tags and his own document structure.

XML uses a DTD (Document Type Definition) to describe any type data (document).

XML with a DTD is designed to be self-descriptive.

XML is free and extensible

XML is as a cross-platform, software and hardware independent tool for transmitting information.

Page 5: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML does not DO anything

XML was not designed to DO anything.

It is just “pure information” wrapped in XML tags. Someone must write a piece of software to send it, receive it or display it.

XML is not a language; it is a syntax supporting creation of personalized markup languages.

Page 6: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML does not DO anything

Ex:

<?xml version="1.0"? Encoding=“ISO_8859-1”>

<note> <to>Tove</to> <from>Jani</from>

<heading>Reminder</heading> <body>Don't forget me this weekend!</body>

</note>

The “note” has a header, and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. Someone must write a piece of software to send it, receive it or display it.

Page 7: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML to Exchange Data

XML was designed to store, carry and exchange data.With XML, data can be exchanged between incompatible systems.

In the real world, computer systems and databases contain data in incompatible formats.

Converting the data to XML can greatly reduce the complexity of data exchange and create data that can be read by many different types of applications.

Page 8: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML used to Share Data

With XML, plain text files can be used to share data.

Since XML data is stored in plain text format, XML provides a software- and hardware-independent way of sharing data.

This makes it much easier to create data that different applications can work with. It also makes it easier to expand or upgrade a system to new operating systems, servers, applications, and new browsers. 

Page 9: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML used to Store Data

With XML, plain text files can be used to store data.

XML can also be used to store data in files or in databases. Applications can be written to store and retrieve information from the store, and generic applications can be used to display the data.

Page 10: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

The syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use.

Creating software that can read and manipulate XML is very easy to do.

Page 11: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

Element (also called tag) is the primary building block of an XML document. Xml elements are case sensitive and must be properly nested. Elements are related as parents and children.

With XML, the tag <Letter> is different from the tag <letter>. Opening and closing tags must therefore be written with the same case.

Attributes provide additional information about the element. Their values (enclosed in quotes) are inside the start tag of an element. An attribute is a name-value pair separated by an equal sign (=)

Entities are shortcuts for portions of common text (entity reference starts with “&” and ends with “;”)

Page 12: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

Comments – arbitrary text- may be inserted anywhere in an XML document (comment starts with “<!-” and ends with “->”)

Comment   ::=   '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->‘

An example of a comment:<!-- declarations for <head> & <body> -->

Note that the grammar does not allow a comment ending in --->. The following example is not well-formed<!-- B+, B, or B--->

Page 13: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

Document type declaration (DTD) is the set of rules that allows to specify own set of elements, attributes and entities. A DTD specifies which elements can be used and constraints on elements

A DTD defines the legal elements of an XML document that is the legal building blocks of an XML document. It defines the document structure with a list of legal elements.

XML Schema is an XML based alternative to DTD.

Page 14: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Why use a DTD?

XML provides an application independent way of sharing data.

With a DTD, independent groups of people can agree to use a common DTD for interchanging data.

Any application can use a standard DTD to verify that data received from the outside world is valid.

Page 15: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

All XML elements (a part XML declaration )must have a closing tag. The XML declaration

<?xml version="1.0"? Encoding=“ISO_8859-1”>

is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.

The XML declaration defines the XML version and the character encoding used in the document.

Page 16: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

All XML documents must have a root tag

The first tag in an XML document is the root tag.

All XML documents must contain a single tag pair to define the root element (ex.<note> ).

All other elements must be nested within the root element.

All elements can have sub-elements (children). Sub-elements must be correctly nested within their parent element:

<root> <child> <subchild>.....</subchild> </child> </root> In previous example there are 4 child elements of the root (to, from,

heading, body)

Page 17: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

Attribute values must always be quoted

With XML, it is illegal to omit quotation marks around attribute values. 

XML elements can have attributes in name/value pairs just like in HTML.

In XML the attribute value must always be quoted.

Page 18: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

<?xml version="1.0"?> <note date=12/11/99> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

Incorretto

<?xml version="1.0"?> <note date="12/11/99"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

corretto

Page 19: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

White Space is Preserved

CR / LF is Converted to LF

A new line is always stored as LF

Page 20: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Syntax

There is nothing special about XML. It is just plain text with the addition of some XML tags enclosed in angle brackets.

Software that can handle plain text can also handle XML. In a simple text editor, the XML tags will be visible and will not be handled specially.

In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application.

Page 21: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Elements

XML Elements are Extensible

XML documents can be extended to carry more information.

XML Elements have Relationships

Elements are related as parents and children

Page 22: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML Elements

Book Title: My First XML

Chapter 1: Introduction to XML

What is HTML

What is XML

Chapter 2: XML Syntax

Elements must have a closing tag

Elements must be properly nested

Page 23: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML element (book description)

<book> <title>My First XML</title> <prod id="33-657" media="paper"></prod><chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para><para>Elements must be properly nested</para></chapter> </book>

Page 24: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML element (book description)

book is the root element.title, prod, and chapter are child elements of book.book is the parent element of siblings (or sister elements)

because they have the same parent.

Page 25: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Elements have Content

Elements can have different content types.

An XML element is everything from (including) the element's start tag to (including) the element's end tag.

An element can have element content, mixed content, simple content, or empty content. An element can also have attributes.

Page 26: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Elements have Content

In the book description:

book has element content, because it contains other element;

chapter has mixed content because it contains both text and other elements;

para has simple content (or text content) because it contains only text;

prod has empty content because it carries no information.

Page 27: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Element Naming

Names can contain letters, numbers, and other characters

Names must not start with a number or punctuation character

Names must not start with the letters xml (or XML or Xml ..)

Names cannot contain spaces

Page 28: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Element Naming

XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. A good practice is to use the naming rules of the database for the elements in the XML documents

Page 29: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Ex. XML News document

<?xml version="1.0"?> <nitf> <head> <title>Colombia Earthquake</title> </head> <body> <body.head> <headline> <hl1>143

Dead in Colombia Earthquake</hl1> </headline>

<byline> <bytag>By Jared Kotler, Associated Press Writer</bytag> </byline>

<dateline> <location>Bogota,Colombia</location> <story.date>Monday January 25 1999 7:28 ET</story.date> </dateline> </body.head> </body> </nitf>

Page 30: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

DTD

A DTD is enclosed in

<!DOCTYPE name [DTD declaration ]> where name is the name of the outermost

enclosing tag, and [DTD declaration ] is the text of the rules of the DTD

The DTD starts with the outermost element, called the root of the element

Page 31: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Internal DTD

<?xml version="1.0"?> <!DOCTYPE note [<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from>

<heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

The DTD is interpreted like this:!ELEMENT note defines the element "note" as having four elements: "to,from,heading,body".!ELEMENT to defines the "to" element  to be of the type "CDATA".!ELEMENT from defines the "from" element to be of the type "CDATA"and so on.....

Page 32: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

CDATA Sections

CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup.

CDATA sections begin with the string "<![CDATA["

and end with the string

"]]>"

Page 33: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

CDATA Sections

CDSect   ::=   CDStart CDataCDEndCDStart   ::=   '<![CDATA[‘CData   ::=   (Char* - (Char* ']]>' Char*)) CDEnd   ::=   ']]>‘

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "&lt;" and "&amp;". CDATA sections cannot nest.

Page 34: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

CDATA Sections

An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not markup:

<![CDATA[<greeting>Hello,world!</greeting>]]>

Page 35: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

External DTD

This is the same XML document with an external DTD: 

<?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from><heading>Reminder</heading> <body>Don't forget me this weekend!</body></note>

Page 36: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

External DTD

This is a copy of the file "note.dtd" containing the Document Type Definition

<?xml version="1.0"?> <!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

Page 37: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML document (with DTD)

An example of an XML document with a document type declaration

<?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd">

<greeting>Hello, world!</greeting>

The system identifier "hello.dtd" gives the address (a URI reference) of a DTD for the document

Page 38: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML document (with DTD)

The declarations can also be given locally, as in this example:

<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>

Page 39: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

XML document (with DTD)

If both the external and internal subsets are used, the internal subset is considered to occur before the external subset.

This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

Page 40: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Language identification

In document processing, it is often useful to identify the natural or formal language in which the content is written.

A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document.

In valid documents, this attribute, like any other, must be declared if it is used.

Page 41: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Language identification

A simple declaration for xml:lang might take the formxml:lang NMTOKEN #IMPLIED

The intent declared with xml:lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml:lang on another element within that content.

Specific default values may also be given, if appropriate. In a collection of French poems for English students, with glosses and notes in English, the xml:lang attribute might be declared this way:

<!ATTLIST poem xml:lang NMTOKEN 'fr'> <!ATTLIST gloss xml:lang NMTOKEN 'en'> <!ATTLIST note xml:lang NMTOKEN 'en'>

Page 42: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

Language identification

<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>

<p xml:lang="en-GB">What colour is it?</p> <p xml:lang="en-US">What color is it?</p> <sp who="Faust" desc='leise' xml:lang="de">

<l>Habe nun, ach! Philosophie,</l> <l>Juristerei, und Medizin</l> <l>und leider auch Theologie</l> <l>durchaus studiert mit heißem Bemüh'n.</l>

</sp>

Page 43: Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a. 2004-2005.

“Well formed” XML documents

A “well formed” XML document has correct XML syntax (i.e. is a document that conforms to the XML syntax rules.

A “valid” XML document is a “well formed” XML document which also conforms to the rules of a DTD (Document Type Definition).