1 XML Basics Roberto Bruni Dipartimento di Informatica Università di Pisa Models and Languages for...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 1 XML Basics Roberto Bruni Dipartimento di Informatica Università di Pisa Models and Languages for...
1
XML Basics
Roberto BruniDipartimento di Informatica Università di Pisa
Models and Languages for Coordination and Orchestration
IMT- Institutions Markets Technologies - Alti Studi Lucca
2
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Content XML DTD XML Schema
3
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Content XML DTD XML Schema
4
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
XML for Web Services Web Services are
loosely coupled software components delivered over Internet standard technologies
Today standard technology for interoperability is XML (like it or not)
All WS technologies are based on XML
5
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
What is XML? XML (eXtensible Markup Language) is an
industry-standard text-based markup language system-independent way of representing data
to make data portable Data are indentified using tags
identifiers enclosed in angle brackets <…> Ex. <message>Hello World</message>
collectively, tags are known as “markup” For background and motivation for XML see
"XML and the Second-Generation Web" by Jon Bosak and Tim Bray
Scientific American, May 6 1999 - http://www.sciam.com/
6
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
"XML and the Second-Generation Web"
Give people a few hints, and they can figure out the rest. They can look at a list of groceries and see shopping instructions. They can look at some rows of numbers and understand the state of
their bank account. Computers, of course, are not that smart;
they need to be told exactly what things are, how they are related and how to deal with them.
Extensible Markup Language (XML for short) is a new language designed to do just that, to make information self-describing.
This simple-sounding change in how computers communicate has the potential to extend the Internet beyond information delivery to many other kinds of human activity.
Indeed, since XML was completed in early 1998 by the W3C, the standard has spread like wildfire through science and into industries ranging from manufacturing to medicine.
7
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
XML in Practice An XML document is usually stored in
a (text) file with extension .xml Document publication Archiving Data exchange Data processing Document-driven programming
8
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
The XML Family (in part) XML XML-NameSpace
mechanism for disambiguating tag names DTD, XML Schema
define the structure of XML documents XSL, XSLT
style and style transformation languages XPointer, XLink, XBase, XPath
languages for hyperlinks and addressing
9
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
HTML and XML Like HTML (HyperText Markup Language), XML
encloses data in tags XML tags are case sensitive (HTML tags are not) XML tags relate to the meaning of the enclosed text,
while HTML tags tell how to display the enclosed text XML is extensible (you can write your own tags),
while with HTML you are limited to using only predefined tags (from the HTML specification)
XML documents must be well-formed http://www.ucc.ie/xml/#FAQ-VALIDWF
You can define class of valid documents DTD (Document Type Definition), XML Schema, and others
10
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Bad HTML: Example
11
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Why is XML important? Plain text not binary format
files can be created and edited with anything from standard text editors to visual development environments
easy to debug can store any amount of data (scalability)
Data identification XML describes the kind of each data data are easy to search, extract, process, use
Stylability XML is inherently style free, but you can use different stylesheets to produce output in
postscript, LaTeX, PDF or other formats (even not invented yet!)
12
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
What Makes XML Portable? XML documents are written in text format
which is readable by both human beings and text-editing software
A schema gives XML data its portability a parser uses schemas to understand the
structure of valid documents (and to validate documents)
XML documents do not include formatting instructions they can be easily displayed in various ways
13
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
A Debatable Argument Since identifying the data gives you
some sense of what means how to interpret it what you should do with it
XML is sometimes described as a mechanism for specifying the semantics (meaning) of the data! Advanced reading: “The essence of XML”
by J. Siméon and P. Wadler, Proc. POPL 2003.
14
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Example<message>
<to>[email protected]</to><from>[email protected]</from><subject>XML class</subject><text>
What is XML?</text>
</message>
15
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
XML Features Ability for one tag to contain others gives
XML its ability to represent any hierarchical data structure
Documents can contain comments XML comments look just like HTML
comments Tags can have attributes (like HTML) Inline reusability
XML entities can be included “in line” in a document
16
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Example Revised<message to="[email protected]"
from="[email protected]" subject="XML class" >
<!-- Revised using attributes and empty tags --><text>
What is XML?</text><unread />
</message>
17
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Heuristics for Designing XML Data Structures I
Attributes or elements? Forced choices for elements
the data contains substructures the data contains multiple lines or paragraphs multiple occurrences are possible data changes frequently
Forced choices for attributes data is a small simple string that rarely (if ever)
changes DTDs are used and data is confined to a small number
of fixed choices
18
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Heuristics for Designing XML Data Structures II
Attributes or elements? Stylistic criteria (a bit nebulous like for art or
music) Visibility
if data is intended to be shown, then elements are better otherwise attributes are ok
containers vs characteristics elements are containers attributes are characteristics of containers
More at http://www.oasis-open.org/cover/elementsAndAttrs
.html
19
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Heuristics for Designing XML Data Structures: Example
In XML documents for slideshows the type of the slide (which audience is aimed
to) is best modeled as an attribute it is a characteristic, not to be shown
the title of the slide is part of the content and it has to be displayed
better to have it as an element
20
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
The XML Prolog An XML file always starts with a prolog
it is a processing instruction <?app instr ?> minimal: <?xml version=“1.0”?> attributes
version XML version used in the data
(optional) encoding character set used to encode data (ex. ISO-8859-1,
UTF-8) (optional) standalone
whether or not the document references external entities
21
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
XML Parser Document processing
Read phase Syntax check Validation (validating parsers only) Errors report (fatal errors, errors, warnings) Access to data
DOM (Document Object Model) parser generate the whole tree-like data structure of the
document (favour random access) SAX (Simple API for XML) parser
event-driven serial access protocol
22
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
XML Browsing Some browsers can parse
XML MS Internet Explorer and
Netscape Navigator use DOM parsers
Elements can be hidden
23
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Well-Formed Documents I All attribute values must be in quotes
The single-quote character (the apostrophe) may be used if the value contains a double-quote character, and vice versa
For isolated quotes as data, you can use ' or " Do not under any circumstances use the automated
typographic (‘curly’) inverted commas substituted by some word-processors for quoting attribute values (like in some of this power point slides!!)
Elements must nest inside each other properly no overlapping markup (same as for HTML)
Exactly one root element (after the declaration) All tags must be balanced
every element which may contain character data or sub-elements must have both the start-tag and the end-tag present
(omission is not allowed except for EMPTY elements)
24
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Well-Formed Documents II Any EMPTY elements (like HTML's IMG, BR, HR and others)
must either end with /> or they must look like non-EMPTY elements by having a real end-tag (but no content)
Example: <br> would become either <br/> or <br></br> (with nothing in between)
There must not be any isolated markup-start characters (< or &) in your text data.
They must be given as < and & The sequence ]]> may only occur as the end of a CDATA
marked section if you are using it for any other purpose it must be given
as ]]>
25
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Well-Formed Documents III XML files with no DTD are considered to have
< (it represents the character < ) > (it represents the character > ) ' (it represents the character ' ) " (it represents the character " ) & (it represents the character & ) predefined and thus available for use
With a DTD, all entities must be declared, including these five
DTDless well-formed documents may use attributes on any element,
but the attributes are all assumed to be of type CDATA.
26
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Parsing Errors: Example
27
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Entities: Example
<?xml version="1.0"?><message> <from><!-- Deitel and Associates --> دايتَ لأند </from> <subject><"it's
me"></subject></message>
28
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
Entities: Example
29
Roberto Bruni @ IMT Lucca 8 March 2005
Models and Languages for Coordination and Orchestration
InstitutionsMarketsTechnologies
IMT
CDATA Sections When large blocks of text include many
special characters it is inconvenient use entity references
Character Data (CDATA) sections can be used instead analogous to HTML tags <pre> ... </pre> they start with <![CDATA[ they finishes with ]]> characters in the middle are NOT INTERPRETED
by the parser